Senior AI Infrastructure Engineer, Large-Scale GPU Clusters
NVIDIA
NVIDIA Corporation in Santa Clara is seeking a Senior Software Engineer to lead the optimization of large-scale AI systems. This role will involve profiling and tuning workloads using cutting-edge NVIDIA technology. The ideal candidate will have over 8 years of experience in software infrastructure for AI systems, with expert-level programming in Python and C/C++. Responsibilities include leading the debugging process of multi-GPU environments and mentoring less experienced engineers. #J-18808-Ljbffr NVIDIA Corporation
- ...NVIDIA Gruppe in Santa Clara is seeking a Senior Software Engineer to lead the optimization of distributed training across large-scale GPU platforms. Candidates should have substantial experience in AI applications and technical leadership. This role involves profiling...Senior
- NVIDIA Gruppe is seeking highly motivated EngOps and Platform Engineers to develop automated tools for managing large GPU clusters. This position requires strong expertise in high-performance computing and deep learning. The ideal applicants have a BS or MS in a relevant...Senior
$356.5k
...NVIDIA Gruppe is seeking an experienced AI infrastructure software engineer to join its DGX Cloud AI Efficiency Team in Santa Clara, California. This role focuses on developing the infrastructure for optimizing AI workloads and ensuring high availability and efficiency...Senior- ...'s DGX Cloud AI Efficiency Team... ...to the infrastructure that powers our... ...resources and scale to foster innovation... ...software engineer to join our... ...that enable large‑scale AI training... .... As a senior DGX Cloud AI... ...large‑scale clusters. Experience in... ...Visualization. The GPU, our...Senior
- ...passionate, and dedicated Senior AI Infrastructure Engineer to join our DGX Cloud group... ..., build and maintain large‑scale production systems with high... ...Engineer at NVIDIA ensures our GPU cloud services deliver... ...multi‑GPU and multi‑node clusters. Engage in and improve the...Senior
- Google Inc. in Sunnyvale, CA is looking for a Software Engineer to develop next-generation technologies crucial to Google’s operational... ...needs. The ideal candidate will have experience with large-scale infrastructure and distributed systems, along with proficiency in...Senior
- ...NVIDIA Gruppe is seeking a Senior Network Engineer to develop and manage a robust cloud network infrastructure. You will lead the design and implementation of large-scale L3 networks across data centers and corporate IT. Ideal candidates will have over 8 years of networking...Senior
- ...technology company is seeking a Software Engineer to develop next-generation... ...and debugging complex issues across large-scale systems. Candidates should have a strong... ...Join a dynamic team at the forefront of AI and infrastructure innovation. #J-18808-Ljbffr Google Inc...Senior
$272k - $425.5k
Principal Software Engineer – Large-Scale LLM Memory and... ...serving generative AI and reasoning... ...Dynamo orchestrates GPU shards, routes requests... ...heterogeneous clusters so that many... ...memory pools.* Mentor senior and junior... ...storage, or ML systems infrastructure in C/C++ and...Local areaRemote work- ...and Visualization. The GPU, our invention,... ...our team of innovative engineers who develop and maintain... ...and maintaining large GPU clusters interconnected via NVLink... ...switches, and related infrastructure. Automation expert... ...Proficiency in designing large scale networking...Senior
$176k - $333.5k
NVIDIA Corporation in Santa Clara is seeking experienced EngOps and Platform Engineers to develop and maintain extensive GPU clusters. The role requires extensive hands-on experience with automation tools and a robust understanding of computer networks. The ideal candidate...Senior- ...NVIDIA Gruppe is looking for an experienced GPU Deployment Engineer to tackle end-to-end AI deployment challenges on the NVIDIA RTX AI platform. The role involves analyzing GPU-accelerated applications, improving user experiences, and collaborating with teams to influence...Senior
- ...of distributed training frameworks for large models (LLMs, multimodal), resolving scalability bottlenecks at the scale of 10k–100k GPU clusters. Kernel & performance tuning. Work... ...utilization. Training pipeline engineering. Build an end-to-end MLOps platform spanning...
$168k - $322k
...NVIDIA Gruppe is seeking a Senior AI Platform Engineer to improve engineering efficiency and data security through AI-powered products. The... ...working with Cloud and AI/ML teams to build and scale infrastructure and shape the technological future of the organization....Senior$180k - $240k
...We are seeking a Senior AI Infrastructure Engineer to design, build, and scale the high-performance AI... ...Architect and optimize multi-GPU setups, ensuring efficient... ...across H100/A100 clusters. Networking & Hardware... ...Splatting (3DGS) and large-scale training. Intelligent...SeniorOdd jobWork at office$200k - $322k
...Our invention of the GPU in 1999 sparked the growth... ...ignited modern AI — the next era of computing... ...today. Design‑for‑X Engineering at NVIDIA works on groundbreaking... ...as part of the AI Infrastructure requirements at an org... ...capacity planning for large‑scale mission‑critical Gen...Senior- Senior Systems Software Engineer - GPU Performance at Scale We are looking for a dedicated engineer for the Senior Systems... ...will drive innovation in AI and GPU computing. What You’ll... ...of performance practices in large‑scale GPU infrastructure, delivering powerful tools, methodologies...Senior
- ...technical, creative, and Senior AI Platform Engineer to build, support,... ...and lead AI-native infrastructure roadmaps and cross‑... .... Architect and scale LLM/ML infrastructure... ...across cloud‑native clusters and on‑premises hardware... ...model serving, and GPU‑accelerated...Senior
$184k - $356.5k
NVIDIA Gruppe is looking for skilled software engineers to develop AI inference systems that operate with high efficiency. The role involves architecting... ...high-performance inference frameworks and optimizing GPU processes. Ideal candidates should have extensive...Senior$149.1k - $215.93k
...differentiate, innovate, and scale across AI, cloud, networking,... .... About the Role Senior MLOps & AI Infrastructure Engineer to architect, build... ...Kubernetes and GPU node pools. Develop... ...‑tune, and deploy large‑scale models... ...performance on GPU/TPU clusters. Build and maintain...SeniorLocal areaShift work- ...NVIDIA Gruppe in Santa Clara is looking for a Senior HPC Architect to support the deployment of large-scale GPU compute clusters. You will provide engineering solutions for GPU computing products, ensuring technical relationships with teams and assisting in creative solutions...Senior
- ...Clara is seeking a technical leader for the GPU AI/HPC Infrastructure team. You will design and implement cutting-edge GPU compute clusters, focusing on deep learning and high-... ...have at least 5+ years of experience with large-scale infrastructure, strong programming...Senior
- Google Inc. is seeking a Senior Software Engineer for its Infrastructure team in Mountain View, CA. In this role, you will leverage your expertise in C++, software design, and large-scale systems to develop cutting-edge technologies for Google Ads. Responsibilities include...Senior
- ...Databricks in Mountain View is seeking a Senior Software Engineer to join our Networking Infrastructure team. You will design secure, scalable networking solutions for large-scale compute across clouds. Ideal candidates will have 5+ years in programming languages like...Senior
- ...NVIDIA Gruppe is seeking a Principal AI and ML Infra Software Engineer to join our Hardware Infrastructure team in Santa Clara, CA. In this role, you'll work closely... ...by addressing infrastructure deficiencies for GPU Clusters, fostering innovations in AI/ML research. The...
$272k - $431.25k
...NVIDIA Corporation seeks a Principal AI and ML Infra Software Engineer in Santa Clara, California, to... ...the efficiency of AI/ML research on GPU Clusters. The role involves collaboration with various teams, monitoring infrastructure performance, and implementing improvements...- NVIDIA Corporation is looking for an HPC Cluster Engineer in Santa Clara, California, to design and operate GPU Compute Clusters for EDA and high-performance... ...will have extensive experience with large-scale compute infrastructure and exceptional skills in automation and...Senior
$184k - $356.5k
...NVIDIA Gruppe is seeking an experienced engineer to lead GPU cluster design and support for AI and HPC deployments in Santa Clara, California. The... ...candidate will have over 8 years of experience with large-scale GPU infrastructure and a strong ability to communicate complex...Senior$152k - $241.5k
Senior Software Engineer, Fabric Networking - GPU page is loaded## Senior Software Engineer, Fabric Networking - GPUlocations... ...hardware and software to support large scale computing platforms.* Work with... ...an existing vacancy.NVIDIA uses AI tools in its recruiting processes....SeniorRemote work$152k - $241.5k
NVIDIA Corporation is seeking a Senior ML Platform Engineer to design and scale high-performance ML infrastructure. You'll utilize IaC techniques with Ansible and Terraform, collaborating closely with ML researchers and ensuring system reliability and performance. This...SeniorRemote job
Do you want to receive more vacancies?
Subscribe and receive similar vacancies to Senior AI Infrastructure Engineer, Large-Scale GPU Clusters. Be the first to apply!
- ai developer Santa Clara, CA
- ai prompt engineer Santa Clara, CA
- ai engineer Santa Clara, CA
- senior ai engineer Santa Clara, CA
- ai ml engineer Santa Clara, CA
- ai engineer remote Santa Clara, CA
- machine learning ai engineer Santa Clara, CA
- security infrastructure engineer Santa Clara, CA
- infrastructure engineer Santa Clara, CA
- lead infrastructure engineer Santa Clara, CA



