Sign up to access all features of our service.
  • Job search
  • Favorites
  • Create a CV
    New
  • Salaries
  • Subscriptions

Platform Support Architect

$175k - $200k

DataDirect Networks Inc

Platform Support Architect

Job Locations


US-CA-San Francisco - Remote

Job ID


2026-5845

Name Linked

Remote: San Francisco, CA

Country

United States

City

San Francisco - Remote

Worker Type


Regular Full-Time Employee

Posting Location : State/Province

CA

Overview

This is an incredible opportunity to be part of a company that has been at the forefront of AI and high-performance data storage innovation for over two decades. DataDirect Networks (DDN) is a global market leader renowned for powering many of the world's most demanding AI data centers, in industries ranging from life sciences and healthcare to financial services, autonomous cars, Government, academia, research and manufacturing.

"DDN's A3I solutions are transforming the landscape of AI infrastructure." - IDC

"The real differentiator is DDN. I never hesitate to recommend DDN. DDN is the de facto name for AI Storage in high performance environments" - Marc Hamilton, VP, Solutions Architecture & Engineering | NVIDIA

DDN is the global leader in AI and multi-cloud data management at scale. Our cutting-edge data intelligence platform is designed to accelerate AI workloads, enabling organizations to extract maximum value from their data. With a proven track record of performance, reliability, and scalability, DDN empowers businesses to tackle the most challenging AI and data-intensive workloads with confidence.

Our success is driven by our unwavering commitment to innovation, customer-centricity, and a team of passionate professionals who bring their expertise and dedication to every project. This is a chance to make a significant impact at a company that is shaping the future of AI and data management.

Our commitment to innovation, customer success, and market leadership makes this an exciting and rewarding role for a driven professional looking to make a lasting impact in the world of AI and data storage.

Job Description

DDN is expanding our Enterprise and Sovereign AI Solution offerings, for example Hyperpod - a turnkey NVIDIA AI Data Platform built on DDN Infinia storage, NVIDIA AI Enterprise (NVAIE), and Supermicro reference hardware, optimized for inference and RAG workloads. Our support organization is deep on storage (Infinia, EXAScaler); we are now hiring an AI platform specialist to lead supportability and enablement for the AI side of the stack - NVIDIA AI Enterprise services (NIMs, NeMo, Triton, GPU Operator, licensing), vector databases (initially Milvus), RAG/agentic workflows, and the highperformance storage and networking fabric that underpins them.

You will be a trusted technical advisor within Support and across OEM and NVIDIA partner teams, combining the mindset of a solutions architect (architecture, reference patterns, PoCs, reusable assets) with that of a L3 support engineer. You'll help DDN and our partners operate AI Data solutions as a cohesive AI platform, not just a collection of components.

Key ResponsibilitiesPlatform support
    Act as the primary NVIDIA AI Enterprise and vector database solutions expert for HyperPOD customer environments, bringing deep knowledge of NVAIE services (e.g., NIMs, NeMo, Triton, TensorRT/TensorRTLLM, GPU Operator, licensing/NLS) and vector databases (e.g., Milvus) to guide diagnosis, optimization, and solution design.
  • Own complex endtoend triage across GPU, NVAIE services, vector DB, Kubernetes, Docker, highspeed networking, and Infinia storage, distinguishing product defects from environmental and integration issues.
  • Diagnose and resolve performance bottlenecks in RAG and agentic AI workflows, from model selection and prompt/RAG configuration throughto vector search, GPU utilization, and data access patterns.
  • Collect and interpret logs and telemetry across Linux, containers, Kubernetes, GPU stack, vector DB, and storage/networking; build minimal repros and highquality defect reports for escalation to NVIDIA, vectorDB vendors, OEMs, and internal engineering.

Runbooks, diagnostics, and supportability
  • Author and maintain support triage runbooks and checklists for HyperPOD covering NVAIE services, Milvus/vector DB, GPU stack, Docker, Kubernetes resources, and their interaction with Infinia and the network fabric.
  • Define and validate unified diagnostics bundles that capture the right logs/configs/metrics from all relevant layers (Infinia, GPUs, NVAIE, Milvus, Kubernetes, network) to enable fast problem isolation and highsignal escalations.
  • Collaborate with observability and tools teams to shape Prometheus/Grafana/ELK/NetQ or equivalent dashboards that surface both platform health and RAG/servicelevel metrics (e.g., TTFT, retrieval latency, error rates, throughput).
Enablement, PoCs, and reusable assets
  • Build handson labs and PoCs that mirror customer RAG and agentic AI use cases on HyperPOD, validating supportability and capturing "known good" configurations and troubleshooting patterns.
  • Develop reusable technical assets - implementation guides, bestpractice playbooks, tuning checklists, example architectures - to accelerate timetovalue for customers, PS, and Support.

Design feedback, readiness, and crossfunctional leadership
  • Provide structured feedback from early field cases and PoCs into Product Management and Engineering on stack compatibility, upgrade order, rollback constraints, and observability needs for NVAIE, Milvus/cuVS, Infinia, and networking.
  • Collaborate closely with NVIDIA solutions architects, OEM architects, PS, and Support Innovation to align reference architectures and best practices with realworld support experience.
Required Experience & SkillsTechnical
  • 5+ years in Linuxbased infrastructure roles (SRE, MLOps, platform engineering, or L2/L3 support) supporting production systems; 8+ years total technical experience preferred.
  • Strong handson experience with containers and Kubernetes (Docker/containerd, Helm, Operators; debugging pods, DaemonSets, CSI, CNI, and ingress/load balancers).
  • Demonstrated experience operating GPUaccelerated workloads in production:
    • NVIDIA GPUs, drivers, CUDA concepts, GPU utilization/perf triage
    • NVIDIA GPU Operator and Kubernetesbased GPU lifecycle management
    • Familiarity with DGX / HGX or similar GPU cluster platforms.
  • Practical experience with AI storage and networking for HPC/AI clusters:
    • Highperformance storage systems (e.g., EXAScaler/Lustre, GPFS, Ceph, distributed object storage, enterprise NAS/SAN).
    • RDMAaccelerated and/or highspeed Ethernet/InfiniBand networking, including fabrics, switch topologies, and largescale deployments.
    • Hybrid cloud or cloudadjacent patterns (Kubernetes CSI, cloudnative fabrics, data locality).
  • Experience with one or more vector databases (Milvus, Qdrant, Pinecone, pgVector, OpenSearch/Elasticsearch vectors, etc.), including schema design, ingestion, and operations.
  • Solid understanding of RAG and Generative AI workflows: embeddings, retrieval, reranking, prompt design, context management, and how these interplay with vector search and GPU inference at scale.
  • Familiarity with NVIDIA AI Enterprise components and toolchain, for example:
    • NVIDIA NIM inference microservices
    • NVIDIA NeMo framework / NeMo Retriever / NeMo Curator
    • Triton Inference Server, TensorRT / TensorRTLLM, CUDA libraries
    • NVIDIA blueprints for enterprise RAG and agentic AI.
  • Experience designing, operating, or supporting MLOps / GenAI pipelines: CI/CD for models, deployment strategies, canarying/rollback, GPU resource management, monitoring and alerting for AI services.
  • Strong diagnostic skills across Linux, containers, Kubernetes, GPUs, storage, and networking; able to quickly narrow fault domains and propose experiments or configuration changes.
Support, architecture, and stakeholder skills
  • Track record of building reusable technical assets (runbooks, KBs, implementation guides, benchmarks, PoC templates) that improve support readiness and partner/customer success.
  • Excellent communication skills, capable of clearly explaining complex AI platform topics to both engineers and executive stakeholders, internally and with partners.

Preferred Qualifications
  • Prior experience with scaleout storage in GPU/AI environments.
  • Direct experience crafting and operating RDMAaccelerated HPC/AI clusters at scale, including spineleaf or fattree network designs and large switch/router deployments.
  • Handson work with NVIDIA reference blueprints (Enterprise RAG, VSS, AIQ, industryspecific blueprints) or similar enterprise AI architectures.
  • Familiarity with AI observability and responsible AI practices (guardrails, monitoring for drift/toxicity, basic understanding of regulatory considerations like GDPR/HIPAA in the context of AI systems).
  • Experience with observability stacks (Prometheus, Grafana, Loki/ELK, NetQ, etc.) tuned for AI workloads, including servicelevel dashboards and SLOs.

What Success Looks Like in This Role

Within 6-12 months, a successful AI Data Platform Solutions Architect will have:

  • Become the goto internal expert for "how this AI and networking stack actually works in production" across Support, PS, Product, and NPI for HyperPOD.
  • Drive speed and quality of support at solution level; NVAIE, vector DB, and AIworkflow issues through highquality diagnostics, architecture insight, and welldefined "golden stack" patterns.
  • Established clear, repeatable triage and escalation patterns for AIside incidents that L1/L2 storage engineers can follow with confidence.


Salary Range for this role: $175,000 - $200,000

DDN

DDN has a very strong orientation towards these 4 characteristics and any successful employee will demonstrate these capabilities:

Self-Starter - Takes independent action to identify and solve problems. Seeks out relevant information needed to make decisions. Gets involved with new initiatives.

Success/Achievement Orientation - Delivers quality results consistently. Targets, achieves (or exceeds) measurable results. Sets challenging goals, focuses on critical priorities, and is accountable.

Problem Solving - Recognizes problems and responds with a systematic assessment that identifies and addresses cause of issue. Practical, realistic, and resourceful.

Innovative - Builds and improves key business processes that enhance the effectiveness of DDN. Generates new ideas, challenges the status quo, and solves problems creatively.

DataDirect Networks, Inc. is an Equal Opportunity/Affirmative Action employer. All qualified applicants will receive consideration for employment without regard to race, color, religion, gender, gender identity, gender expression, transgender, sex stereotyping, sexual orientation, national origin, disability, protected Veteran Status, or any other characteristic protected by applicable federal, state, or local law.

#LI-Remote

Vacancy posted 3 hours ago
Similar jobs that could be interesting for youBased on the Platform Support Architect in San Francisco, CA vacancy
  • $123.1k - $273k

     ...Customer Success, bringing together customer support engineers, proactive monitoring, specialist product advisors and technical architects. This team not only resolves technical...  ...) ~ Expertise in Salesforce products, platform capabilities, technical governance, and best... 
    Platform
    Immediate start
    Worldwide

    Salesforce.Com Inc

    San Francisco, CA
    3 days ago
  • Anchorage Digital is seeking a candidate to drive the system architecture and technical direction for their Atlas business. Your primary role will involve working closely with the engineering team to maintain high-quality software solutions tailored for institutional clients...
    Platform

    SOLANA FOUNDATION

    San Francisco, CA
    3 days ago
  •  ...A City technology organization in San Francisco seeks a PermitSF Solution Architect to lead the integration architecture for a new permitting platform. Responsibilities include defining enterprise standards, guiding architectural decisions, and ensuring interoperability... 
    Platform

    San Francisco Employees' Retirement System

    San Francisco, CA
    3 days ago
  •  ...A leading social network platform is seeking a Senior Technical Leader to shape the vision for machine learning. This hybrid role based in California involves architecting large-scale recommendation systems and enhancing user connections through LLMs. With over 10 years... 
    Platform

    Grindr

    San Francisco, CA
    3 days ago
  • 5V Tech is looking for a Principal Embedded AI Architect to design and optimize embedded AI software systems for ultra-low-power hardware platforms. The ideal candidate will have a strong background in embedded systems and proven experience in deploying AI workloads on... 
    Platform

    5V Tech

    San Francisco, CA
    3 days ago
  •  ...company in San Francisco is seeking an experienced developer to architect and build distributed protocols. You will be responsible for creating secure and optimized code to enhance our developer platform. The ideal candidate has over 6 years of industry experience, strong... 
    Platform
    Contract work

    Alchemy

    San Francisco, CA
    3 days ago
  • $222.8k - $334.2k

     ...A leading social media platform is seeking a Principal Mobile Engineer to drive technical strategy for native app development. You will collaborate across teams, mentor engineers, and set development standards. The ideal candidate will have 8+ years in mobile development... 
    Platform
    Remote work

    Reddit

    San Francisco, CA
    3 days ago
  •  ...A technology consulting firm in San Francisco is seeking an experienced Enterprise Architect to lead platform modernization initiatives focusing on MuleSoft, Salesforce, Java, AWS, and Informatica. Ideal candidates will have over 10 years of experience in architecting... 
    Platform

    Compunnel

    San Francisco, CA
    3 days ago
  • $160k - $200k

     ...clients, and configuring solutions to enhance content supply chains. The ideal candidate will have experience with multiple SaaS platforms and possess the necessary certifications. A competitive salary ranges from $160,000 to $200,000 with benefits such as health... 
    Platform
    Remote work

    Credera Deutschland GmbH

    San Francisco, CA
    2 days ago
  •  ...Artha Nexgen is seeking an AI / Generative AI Architect in San Francisco, CA, to design and deliver enterprise-grade AI platforms. Ideal candidates will have over 10 years of experience in software architecture and significant expertise in AI/ML systems, particularly Generative... 
    Platform

    Artha Nexgen

    San Francisco, CA
    3 days ago
  •  ...Description Job Description Business Architect Local to bay area--California 1...  ..., estimating implementation costs, and supporting change teams by describing the technology...  ...Systems (OMS), Data & Analytics Platforms (e.g. Palantir Foundry), Middleware (e.... 
    Platform
    Work experience placement
    Local area

    DELTASOFT SOLUTIONS LLC

    San Francisco, CA
    13 days ago
  •  ...experienced MuleSoft Developer with strong architectural expertise to support integration initiatives for a retail client. The ideal...  ...implementing scalable integration solutions using the MuleSoft Anypoint Platform, while ensuring best practices in performance, security, and... 
    Platform

    Compunnel

    San Francisco, CA
    3 days ago
  •  ...infrastructure. The company is backed by leading venture firms and founded by a rare combination of former enterprise platform leaders and deeply technical architects who have built and scaled mission-critical systems from first principles. We are hiring the Founding... 
    Platform

    Harrison Clarke

    San Francisco, CA
    3 days ago
  •  ...Job Title Implement and configure Adobe Analytics solutions for web and mobile platforms Design and maintain data layers and tagging strategies for accurate data collection Collaborate with development, marketing, and analytics teams to define tracking requirements... 
    Platform

    eTeam

    San Francisco, CA
    6 days ago
  •  ...looking for a Sr Machine Learning Engineer to join our innovative team in San Francisco. This role is centered around developing our AI platform, Realm-X, requiring expertise in machine learning systems and leadership in technical initiatives. The ideal candidate will drive... 
    Platform

    AppFolio

    San Francisco, CA
    3 days ago
  •  ...A technology company is seeking a Software Architect to lead the architecture of next-generation Go-to-Market systems. The candidate will be responsible for modernizing platforms using AI-driven architectures, collaborating with global teams to innovate Autodesk's revenue... 
    Platform

    Autodesk

    San Francisco, CA
    3 days ago
  •  ...Cyber1Armor is seeking a Senior ServiceNow AI Architect to drive AI innovation and scalable adoption across the ServiceNow platform. This role involves identifying practical AI use cases that enhance operational effectiveness, ensuring alignment with enterprise AI strategies... 
    Platform

    Cyber1Armor

    San Francisco, CA
    3 days ago
  •  ...Job Title: Senior ServiceNow AI Architect Location: San Francisco, CA Role Overview: We are seeking a Senior ServiceNow...  ...impact, and scalable enterprise adoption across the ServiceNow platform. This role will focus on identifying and delivering practical... 
    Platform

    Jobs via Dice

    San Francisco, CA
    1 day ago
  • $200k - $240k

    Zip in San Francisco is looking for an experienced engineer to anchor their new Internal AI team. You will build central capabilities for non-software engineering use cases and serve as a key contact into Zip's core Engineering organization. The role requires over 6 years...
    Platform
    Flexible hours

    ZIP

    San Francisco, CA
    3 days ago
  • $170k - $277k

    Palo Alto Networks, Inc. is seeking a Senior Principal Backend Engineer to lead backend development for cybersecurity solutions in San Francisco. The ideal candidate will have 14+ years of software engineering experience, expert skills in Python and Go, and a strong background...
    Platform

    Palo Alto Networks

    San Francisco, CA
    3 days ago
  •  ...Jack & Jill, an autonomous AI startup in San Francisco, is seeking a Head of Growth to architect and lead their go-to-market strategy. This role involves direct collaboration with founders and ownership of the marketing stack to redefine venture creation in the AI space... 
    Platform

    Jack & Jill

    San Francisco, CA
    3 days ago
  •  ...A pioneering technology company in San Francisco is seeking a Senior Staff Architect specializing in AI and Reinforcement Learning. You will shape the vision and development of RL-driven AI systems that redefine enterprise processes. The role requires extensive expertise... 
    Platform

    JazzX AI

    San Francisco, CA
    3 days ago
  • $130k - $170k

     ...global technology and consulting firm is seeking an AWS Bedrock Architect to design, implement, and scale Generative AI systems using...  ...architecture and hands-on expertise with AWS Bedrock. This role supports hybrid work options and requires residence in the Bay Area,... 
    Platform

    Mogi I/O : OTT/Podcast/Short Video Apps for you

    San Francisco, CA
    3 days ago
  • A cutting-edge AI platform company seeks a Senior R&D Engineer in System Architecture. This critical role involves designing and optimizing...  ...and system integration. We offer attractive compensation, a supportive culture, and opportunities to contribute to significant AI... 
    Platform

    Axelera AI

    San Francisco, CA
    4 days ago
  •  ...A leading tech startup in San Francisco is seeking a backend developer to architect and scale essential components of their knowledge base platform. The role focuses on implementing real-time collaboration and integration services, utilizing Elixir for backend development... 
    Platform

    Slab

    San Francisco, CA
    3 days ago
  •  ...We are seeking Salesforce.com Lead Architects to serve as domain experts to sales and delivery teams on the force.com, Sales Cloud, and Service Cloud platforms. With a broad background the ideal candidate will be comfortable discussing and demonstrating all aspects... 
    Platform

    Masonbrick

    San Francisco, CA
    3 days ago
  •  ...Dell Boomi Architect Rate : $70/Hr on W2 Location: Bay Area, CA (onsite ) Boomi Architect Certification Mandatory Client: HCL...  ...and developing integration solutions using the Dell Boomi platform. You will collaborate with various teams to understand their needs... 
    Platform
    Remote work

    Concord IT Systems

    San Francisco, CA
    4 days ago
  •  ...Harrison Clarke is looking for a Founding Backend Architect to help design and build the core of a new AI-native enterprise operations platform in San Francisco. This role involves defining secure, auditable, and scalable systems, with a focus on creating highly configurable... 
    Platform

    Harrison Clarke

    San Francisco, CA
    3 days ago
  •  ...An innovative firm is seeking a Cloud Platform Expert to design and implement robust cloud solutions on Google Cloud Platform. This role involves leveraging expertise in GCP services, automation tools like Terraform, and scripting languages to optimize cloud operations... 
    Platform

    TechDigital Group

    San Francisco, CA
    3 days ago
  •  ...leading specialized firm is seeking an Opcenter MES Specialist to architect and deliver digital transformation solutions. The role...  ...medical industries, and expertise in the Siemens Opcenter MES platform. This position is hybrid, requiring monthly office presence in... 
    Platform
    Work at office

    Cedent

    San Francisco, CA
    2 days ago

Do you want to receive more vacancies?

Subscribe and receive similar vacancies to Platform Support Architect. Be the first to apply!