Sign up to access all features of our service.
  • Job search
  • Favorites
  • Create a CV
    New
  • Salaries
  • Subscriptions

Senior Solutions Architect, AI Cluster Performance and Telemetry

$184k - $287.5k

NVIDIA

Senior Solutions Architect Specializing in Data Center Systems & Performance

We are looking for a Senior Solutions Architect specializing in Data Center Systems & Performance to join our elite solutions architecture team. In this role, you will work at the intersection of groundbreaking hardware and complex software stacks. As a Solutions Architect, you will act as a pivotal technical expert uniting engineering, field teams, and customers with highly intensive requirements. You will be responsible for analyzing and optimizing the performance of world-class AI, deep learning, and HPC ecosystems. Come join us!

What you'll be doing:

  • Work together with our partners and customers to identify, analyze, and resolve complex performance bottlenecks across interconnected GPU, CPU, and networking systems.
  • Complete and maintain robust performance benchmarking suites to stress-test high-performance clusters and establish performance baselines.
  • Apply industry-standard performance tools to monitor hardware performance counters and extract deep system telemetry.
  • Deeply investigate system and software configurations to find and fix subtle discrepancies that impact peak performance.
  • Partner closely with internal engineering units and outside collaborators and customers to collectively develop solutions and boost infrastructure performance.

What we need to see:

  • BS or MS in Engineering, Electrical Engineering, Physics, or Computer Science (or equivalent experience).
  • 8+ years of work-related experience in the high-tech industry, particularly in system build, performance analysis, and technical customer-facing roles.
  • A strong understanding of how CPUs, GPUs, and high-speed networking fabrics interact within massive clusters.
  • Practical experience with performance counters, profiling tools, and telemetry collection systems (e.g., Perf, eBPF, Prometheus, Grafana).
  • Practical experience working with containers, cloud provisioning, and scheduling tools such as Docker, Docker Swarm, Kubernetes, SLURM, Ansible.
  • Proven track record of transforming raw logs and telemetry into structured time series data, dashboards, and heat maps.
  • The ability to translate complex, low-level technical performance anomalies into clear, actionable narratives for cross-functional teams.
  • Strong collaborative skills and a proven history of building successful relationships across diverse engineering and operations teams.

Ways to stand out from the crowd:

  • Deep knowledge of multi-GPU communication libraries like NCCL, and how they optimize inter-GPU topologies.
  • Deep, hands-on experience working directly with NVIDIA hardware architectures, NVLink, NVSwitch, or NVIDIA Nsight tools.
  • Practical experience optimizing distributed AI training workloads, LLMs, or large-scale high-performance computing environments.
  • Experience developing or integrating Agentic AI frameworks to autonomously parse telemetry logs, diagnose configuration drifts, or automate cluster triage.

Your base salary will be determined based on your location, experience, and the pay of employees in similar positions. The base salary range is 184,000 USD - 287,500 USD for Level 4, and 224,000 USD - 356,500 USD for Level 5. You will also be eligible for equity and benefits.

Applications for this job will be accepted at least until June 8, 2026. This posting is for an existing vacancy. NVIDIA uses AI tools in its recruiting processes. NVIDIA is committed to fostering an inclusive work environment and proud to be an equal opportunity employer. As we highly value diversity in our current and future employees, we do not discriminate (including in our hiring and promotion practices) on the basis of race, religion, color, national origin, gender, gender expression, sexual orientation, age, marital status, veteran status, disability status or any other characteristic protected by law.

Vacancy posted 13 hours ago
Similar jobs that could be interesting for youBased on the Senior Solutions Architect, AI Cluster Performance and Telemetry in Santa Clara, CA vacancy
  • $184k - $287.5k

     ...accelerated computing platforms for AI and HPC. Because of our work,...  ...We are seeking a highly motivated Senior Solutions Architect to join the Cluster Design and Architecture team with...  ...cluster design and architecture, performance modeling, validation, and NPI cluster... 
    Senior
    Performance

    NVIDIA

    Santa Clara, CA
    5 days ago
  • $184k - $287.5k

     ...innovative accelerated computing platforms for AI and HPC. Because of our work, scientists,...  ...with internal engineering efforts in GPU cluster design and networking and convey...  ...situational limitations to make the most performant and supportable GPU clusters possible Work... 
    Senior
    Performance

    NVIDIA Gruppe

    Santa Clara, CA
    1 day ago
  • $184k - $287.5k

     ...Senior Solution Network Architect, Enterprise Products We are now looking for a Senior...  ...for enterprise-grade AI systems. Your role involves...  ..., and profiling reference cluster designs specifically...  ...scalability, resilience, performance, and security in the compute... 
    Senior
    Performance
    Local area
    Remote work

    NVIDIA

    Santa Clara, CA
    3 days ago
  • $184k - $287.5k

     ...Senior Solutions Architect - AI Factory Deployment We are seeking an ambitious Senior Solutions Architect...  ...and benchmarks on Linux-based GPU clusters, using NCCL and collectives like AllReduce and AllToAll to improve performance and scalability. As part of our world... 
    Senior
    Performance

    NVIDIA

    Santa Clara, CA
    2 days ago
  • $184k - $287.5k

     ...the unlimited potential of AI to define the next era of computing...  ...to join the NVIDIA Solution Architects team. The team supports NVIDIA...  ...framework features, analyzing performance, and sharing actionable...  ...customers/partners to solve cluster performance and stability issues... 
    Senior
    Performance
    Remote work

    NVIDIA

    Santa Clara, CA
    2 days ago
  • $184k - $287.5k

     ...the unlimited potential of AI to define the next era of computing...  ...is searching for an AI/ML Solutions Architect focusing on Hyperscale...  ...or background in HPC (High Performance Computing) environments for...  ...Familiarity with multi-node GPU clusters and performance tuning for large... 
    Senior
    Performance
    Remote work

    NVIDIA

    Santa Clara, CA
    3 days ago
  • $184k - $287.5k

     ...and manufactures high-performance networking equipment that...  ...by networking solutions such as InfiniBand, Ethernet...  ...) we make powerful ML/AI platforms possible. We...  ...AI workloads in large clusters even more performant....  ...networking Sr. Solutions Architect at NVIDIA you will... 
    Senior
    Performance
    Remote work

    NVIDIA

    Santa Clara, CA
    2 days ago
  • $184k - $287.5k

     ...building the world’s leading AI company, and we are looking...  ...for an expert AV and Robotics Solutions Architect who can help customers...  ...technologies to customers. Perform in-depth analysis and optimization...  ...at scale on cloud computing clusters with GPUs. Development... 
    Senior
    Performance

    NVIDIA

    Santa Clara, CA
    3 days ago
  • $184k - $287.5k

    AI/ML Solutions Architect - NVIDIA Lead software customer technical engagement for AI training, inference...  ...or background in HPC (High Performance Computing) environments for AI or ML...  ...applications. Familiarity with multi‑node GPU clusters and performance tuning for large‑... 
    Senior
    Performance

    NVIDIA Gruppe

    Santa Clara, CA
    3 days ago
  • $184k - $287.5k

     ...the unlimited potential of AI to define the next era of computing...  ...is seeking an experienced Solutions Architect to be a trusted technical...  ...and deploying large-scale cluster environments, hands-on experience...  ...encompassing GPU systems, performance testing, AI benchmarking,... 
    Senior
    Performance
    Remote work

    NVIDIA

    Santa Clara, CA
    15 days ago
  • $184k - $287.5k

     ...looking for an experienced Network Solutions Architect Engineer to help bring our next-generation AI networking platforms into...  ...bring-up of server, network, and cluster infrastructure in customer...  ...Analyze and debug configuration and performance issues in RoCE and InfiniBand... 
    Senior
    Performance
    Remote work

    NVIDIA

    Santa Clara, CA
    21 days ago
  •  ...Cloud is a fast-growing AI infrastructure company...  ...model inference API solutions. As an NVIDIA six global...  ...highest standards for performance, security, and...  ...Overview As a Solution Architect, you will be the primary...  .... Architect GPU clusters, storage, networking,... 
    Senior
    Performance
    Worldwide

    GMI Cloud

    Mountain View, CA
    13 hours ago
  •  ...Senior Solution Architect – AI / GPU Cloud Mountain View, California, United States About the Job...  ...diagrams, capacity plans, and cost/performance analyses Translate complex technical...  ...& Enablement Guide onboarding, cluster setup, tuning, and scaling Partner... 
    Senior
    Performance

    Glint Tech Solutions LLC

    Mountain View, CA
    2 days ago
  • $184k - $287.5k

    A leading technology company seeks a Senior Solutions Architect to work on optimizing AI services on ARM CPUs. The role requires 8+ years of experience in...  ...customers through workload migration, implementing performance tuning, and creating technical presentations. The position... 
    Senior
    Performance

    NVIDIA Corporation

    Santa Clara, CA
    4 hours ago
  •  ...Gruppe in Santa Clara is seeking a technical leader for the GPU AI/HPC Infrastructure team. You will design and implement cutting-edge GPU compute clusters, focusing on deep learning and high-performance computing. The ideal candidate will have at least 5+ years of experience... 
    Senior
    Performance

    NVIDIA Gruppe

    Santa Clara, CA
    1 day ago
  • $184k - $287.5k

     ...ambitious and forward-thinking solution architect to help in the enablement of...  ...on the world by applying AI inference aware technology to...  ...Help customers design high-performance and secure workload aware networks...  ...Strong knowledge of network telemetry, logs, SNMP, NetFlow/IPFI.... 
    Senior
    Performance
    Work experience placement

    NVIDIA

    Santa Clara, CA
    5 days ago
  • NVIDIA Gruppe is looking for an AI Solutions Architect in Santa Clara, California. This role focuses on enhancing NVIDIA's internal cloud infrastructure...  ...programming skills. Responsibilities include optimizing performance and collaborating with development teams to iterate... 
    Senior
    Performance

    NVIDIA Gruppe

    Santa Clara, CA
    1 day ago
  • $224k - $356.5k

     ...high-energy, networking engineers to join the Solutions Architecture team in building the world’s largest and fastest AI/HPC systems using NVIDIA Networking. This...  ...Linux, PCIe devices as it relates to networking performance ~ Experience in configuring, testing, and troubleshooting... 
    Senior
    Performance

    NVIDIA

    Santa Clara, CA
    5 days ago
  • $152k - $241.5k

     ...Join the NVIDIA's Solutions Engineering team that is reshaping the...  ...We are looking for solutions architects, who are experts, trusted technical...  ...proficiency in system performance and complexity evaluation to...  ...existing vacancy. NVIDIA uses AI tools in its recruiting processes... 
    Senior
    Performance

    NVIDIA

    Santa Clara, CA
    2 days ago
  • $152k - $241.5k

     ...customers? NVIDIA is seeking a hands-on, action-oriented Senior Solutions Architect to join our team, focused on the technical execution...  ...you’ll be doing: Help architect and scale high-performance, distributed AI infrastructure on-prem or in the cloud built with the... 
    Senior
    Performance

    NVIDIA

    Santa Clara, CA
    5 days ago
  • $184k - $287.5k

     ...member of the NVIDIA Networking Solution Architecture team, your role...  ...in web2.0, cloud, HPC AI, and enterprise datacenter domains...  ...project delivery to design, architect and test Ethernet networking...  ...validate and monitor network performance What we need to see:... 
    Senior
    Performance

    NVIDIA

    Santa Clara, CA
    11 days ago
  • $184k - $287.5k

     ...Join our team at NVIDIA and help bring AI solutions to our largest customers. We are seeking an expert Solutions Architect to assist customers in building AI/ML and HPC...  ...while also offering support in understanding performance aspects related to tasks like large scale LLM... 
    Senior
    Performance

    NVIDIA

    Santa Clara, CA
    3 days ago
  • $184k - $287.5k

     ...NVIDIA is seeking outstanding AI Solutions Architects to assist and support customers that are building solutions with our newest AI technology...  ...also collaborate with a diverse set of internal teams on performance analysis and modeling of inference software. You should be... 
    Senior
    Performance

    NVIDIA

    Santa Clara, CA
    4 days ago
  • $184k - $287.5k

     ...NVIDIA is seeking outstanding AI Solutions Architects to assist and support customers that are building solutions with our newest AI technology...  ...also collaborate with a diverse set of internal teams on performance analysis and modeling of inference software. You should be... 
    Senior
    Performance

    NVIDIA

    Santa Clara, CA
    1 day ago
  • $184k - $287.5k

     ...Solutions Architect For Arm-Based Server CPUs Are you passionate about shaping the future of data...  ...and collaboration while advancing performance and scalability benchmarks. What you...  ...Partners (NCPs) as we develop and run hosted AI services. Together, we will architect... 
    Senior
    Performance

    NVIDIA

    Santa Clara, CA
    3 days ago
  • A leading AI technology company is seeking an experienced AV and Robotics Solutions Architect to help customers enhance Physical AI workloads using state-of-the-art technologies...  ...models, developing proof-of-concepts, and performing optimizations to enhance performance on... 
    Senior
    Performance

    NVIDIA Corporation

    Santa Clara, CA
    4 hours ago
  •  ...day, our work helps care teams perform with greater precision and...  ...We are seeking a ServiceNow Solutions Architect to drive enterprise-wide architecture...  ...design patterns. Advise senior leaders and stakeholders on...  ...ServiceNow processes using AI, automation, and emerging... 
    Senior
    Performance
    Local area
    Worldwide
    Flexible hours

    Intuitive

    Sunnyvale, CA
    5 days ago
  • NVIDIA Corporation in Santa Clara is looking for a Senior Solution Architect to design and deploy AI applications for telecom operations using cutting-edge...  ...includes advising Telco partners and building high-performance systems for network data. The ideal candidate will have... 
    Senior
    Performance

    NVIDIA Corporation

    Santa Clara, CA
    1 day ago
  • $184k - $287.5k

     ...We are building the AI systems that will fundamentally change...  ...help shape that work. As a Senior Solution Architect on our Telco AI team, you...  ...corpora including network telemetry, logs, SNMP, NetFlow/IPFIX,...  .... Advise on high-performance ETL pipeline design for telecom... 
    Senior
    Performance
    Remote work

    NVIDIA

    Santa Clara, CA
    3 days ago
  • $224k - $356.5k

     ...tapping into the unlimited potential of AI to define the next era of computing....  ...the world. NVIDIA is looking for a Senior Solutions Architect to work in IPP's (Infrastructure, Planning...  ...within the cloud Identify performance bottlenecks and optimize the speed and... 
    Senior
    Performance
    Work experience placement
    Remote work
    Worldwide

    NVIDIA

    Santa Clara, CA
    3 days ago

Do you want to receive more vacancies?

Subscribe and receive similar vacancies to Senior Solutions Architect, AI Cluster Performance and Telemetry. Be the first to apply!