Distributed AI Infra Engineer — Multi-GPU Benchmarking

NVIDIA Corporation

NVIDIA Corporation is seeking a Software Engineer in Santa Clara to optimize and benchmark distributed training workloads for AI. The role involves debugging multi-GPU environments and designing automation workflows for large-scale operations. Applicants should possess a Bachelor's or Master’s in Computer Science, strong programming skills in Python and C/C++, and 3+ years of relevant experience. NVIDIA promotes a diverse and inclusive work environment, offering competitive salaries and benefits. #J-18808-Ljbffr NVIDIA Corporation

Apply

Vacancy posted 5 days ago

Similar jobs that could be interesting for youBased on the Distributed AI Infra Engineer — Multi-GPU Benchmarking in Santa Clara, CA vacancy

Senior AI Engineer - GPU-Driven Multi-Agent Systems
...company in Santa Clara is seeking a Senior High Performance AI Engineer to build groundbreaking multi-agent systems for the CUDA ecosystem. The ideal... ...development, proficiency in C/C++ and Python, and experience with GPU programming. This role offers competitive salaries and...
Suggested
NVIDIA
Santa Clara, CA
2 days ago
Senior AI Infrastructure Engineer, Distributed GPU Clusters
$184k - $356.5k
...Corporation is seeking a Senior Software Engineer in Santa Clara to enhance the... ...and reliability of large-scale AI infrastructures. The role involves... ...leadership in debugging and optimizing distributed training workloads across NVIDIA’s GPU platforms. Ideal candidates should...
Suggested
NVIDIA
Santa Clara, CA
3 days ago
Principal AI/ML Infra Engineer for GPU Clusters
...NVIDIA Gruppe is seeking a Principal AI and ML Infra Software Engineer to join our Hardware Infrastructure team in Santa Clara, CA. In this role, you... ...efficiency by addressing infrastructure deficiencies for GPU Clusters, fostering innovations in AI/ML research. The ideal...
Suggested
NVIDIA Gruppe
Santa Clara, CA
2 days ago
Principal AI/ML Infra Engineer GPU Clusters & HPC
$272k - $431.25k
...NVIDIA Corporation seeks a Principal AI and ML Infra Software Engineer in Santa Clara, California, to enhance the efficiency of AI/ML research on GPU Clusters. The role involves collaboration with various teams, monitoring infrastructure performance, and implementing...
Suggested
NVIDIA
Santa Clara, CA
2 days ago
AI Benchmarking and Telemetry Engineer - NVIS
$184k - $287.5k
...AI Benchmarking and Telemetry Engineer - NVIS page is loaded## AI Benchmarking and Telemetry Engineer - NVISlocations... ...of computing. An era in which our GPU acts as the brains of computers,... ...solutions for large-scale distributed systems, with proficiency in tools...
Suggested
Remote work
NVIDIA
Santa Clara, CA
2 days ago
Senior AI Infra Engineer: GPU Clusters & Kubernetes
...A leading AI technology firm in California is seeking an experienced Senior Software Engineer to develop and optimize AI infrastructure software using state-of-the-art GPU systems. Candidates should have a Bachelor's degree in a technical field and a minimum of 5 years...
Intelliswift
Sunnyvale, CA
2 days ago
Principal AI and ML Infra Software Engineer, GPU Clusters
$272k - $431.25k
...We are seeking a Principal AI and ML Infra Software Engineer, GPU Clusters at NVIDIA to join our Hardware Infrastructure team. As an Engineer, you... ...* Capability in supervising and improving substantial distributed training operations using PyTorch (DDP, FSDP), NeMo, or...
NVIDIA
Santa Clara, CA
2 days ago
Principal AI Inference Systems Engineer
...experiences-from AI and data centers,... ...a Senior Staff AI Infra Engineer who is passionate... ...applications and benchmarks, with a special focus... .../ML workloads and GPU-accelerated computing... ...infrastructure, distributed systems, or performance... ...and optimize multi-GPU training performance...
Advanced Micro Devices , Inc.
Santa Clara, CA
7 days ago
Senior High-Performance AI Engineer — GPU & Multi-Agent Systems (Equity)
$184k - $287.5k
...NVIDIA USA in Santa Clara is seeking a Senior High Performance AI Engineer to design and optimize cutting-edge AI systems. The role involves... ...and Python programming skills, and hands-on experience with GPU programming. NVIDIA offers a competitive salary range of $184,0...
2100 NVIDIA USA
Santa Clara, CA
3 days ago
Machine Learning Engineer | Python | Pytorch | Distributed Training | Optimisation | GPU | Hybrid, San Jose, CA
...Machine Learning Engineer | Python | Pytorch | Distributed Training | Optimisation | GPU | Hybrid, San Jose, CA Title: Machine Learning... ...Experience: ~3–5 years in ML/AI engineering roles owning... ...collaborating across Research, Platform/Infra, Data, and Product functions....
Enigma
San Jose, CA
1 day ago
Senior Software Engineer - Perf and Benchmarking
$182k - $242k
...Essential Cloud for AI. Built for... ...Kubernetes-native benchmarking services that measure... ...team. Break down engineering tasks into clear milestones... ...building distributed systems, high-performance... ...-critical GPU systems (CUDA, NCCL... ...benchmarking GPU clusters or multi-region...
Permanent employment
Temporary work
Casual work
Work at office
Remote work
Flexible hours
CoreWeave
Sunnyvale, CA
2 days ago
Senior High Performance AI Engineer
$148k - $235.75k
...unlimited potential of AI to define the next... ...An era in which our GPU acts as the brains... ...High Performance AI Engineer to build groundbreaking multi-agent systems for the... ...libraries, frameworks, distributed training, and... ...optimizations, evidenced by benchmark wins or published...
NVIDIA Gruppe
Santa Clara, CA
2 days ago
Senior Staff Software Development Engineer- GPU/AI/ML
...computing experiences-from AI and data centers,... ...and benchmarks.You will be a member... ...from the lowest-level GPU kernels to large-scale distributed systems, shaping the... ...passion for software engineering, strong technical ownership... ...in distributed, multi-GPU systems....
Advanced Micro Devices , Inc.
Santa Clara, CA
7 days ago
AI Inference Performance Engineer
$152k - $241.5k
We optimize and benchmark GenAI inference on NVIDIA'... ...the intersection of GPU performance engineering and public accountability... ...management, and distributed inference across TensorRT... ...benchmarks, multi-turn coding, agentic... ..., and other emerging AI use cases. Collaborate...
NVIDIA Gruppe
Santa Clara, CA
1 day ago
Senior AI Inference Performance Engineer (GPU/Cluster)
$152k - $241.5k
NVIDIA Gruppe is seeking a talented individual to optimize and benchmark GenAI inference using the latest acceleration technologies.... ...involves driving industry benchmark results and architecting distributed inference systems. Required qualifications include a relevant...
NVIDIA Gruppe
Santa Clara, CA
1 day ago
Senior Python GPU NumPy Library Engineer (Distributed)
$184k
...NVIDIA Gruppe is seeking an experienced software professional to design and develop GPU-accelerated Python APIs for numerical computing. This role involves architecting implementations of numerical algorithms and optimizing APIs for performance across CPU and GPU architectures...
NVIDIA Gruppe
Santa Clara, CA
2 days ago
Senior Python NumPy-Scale Engineer (GPU/Distributed) Equity
$184k - $356.5k
...NVIDIA is seeking an experienced software developer to design and develop GPU-accelerated Python APIs for numerical computing. The role requires strong skills in Python, C++, CUDA, and numerical methods, with an emphasis on developing and optimizing implementations for...
NVIDIA
Santa Clara, CA
2 days ago
Principal AI/ML Engineer, AV ML Infra
$275.8k - $340.5k
...Position Overview The Principal AI/ML Engineer will lead a growing organization, guiding the AV ML Infra team in achieving its mission while shaping long‑term vision... ...Azure) to design, implement, and test scalable distributed computing and data processing solutions in the...
Local area
Remote work
Relocation
Relocation package
Flexible hours
General Motors
Sunnyvale, CA
2 days ago
Senior Cloud Infra Engineer - GPU, Kubernetes & KubeVirt
...NVIDIA Gruppe is seeking a Senior Software Engineer for GPU Cloud Infrastructure in Santa Clara, California. The role focuses on designing... ...of experience in scalable cloud services, with expertise in distributed systems and Go programming. NVIDIA offers competitive...
NVIDIA Gruppe
Santa Clara, CA
2 days ago
Principal AI/ML Engineer, AV ML Infra
$275.8k - $340.5k
...About the team: The AV ML Infra team at GM builds ML infrastructure... ...meet the unique demands of AI and ML innovation, supporting... ...the productivity of ML engineers, and drive the adoption of cutting... ...implement, and test scalable distributed computing and data processing...
Local area
Remote work
Work from home
Relocation
Relocation package
Flexible hours
General Motors
Sunnyvale, CA
7 days ago
Member of Technical Staff (AI Infrastructure Engineer)
...AI Infra Engineer We are looking for an AI Infra engineer... ...HPC environments for distributed training of large language... ...environments Benchmark system performance, diagnose... ...training processes (Multi-Head Attention, Multi/... ...Experience managing GPU clusters and optimizing...
Perplexity AI
Palo Alto, CA
5 days ago
AI/LLM Platform Engineer Lead Kubernetes & GPU Infra
$130k - $170k
...NTT DATA is hiring a Platform Engineer in Santa Clara, California, to lead the design and operation of scalable infrastructure supporting AI/LLM-based solutions. The ideal candidate will have over 5 years of experience in Platform Engineering. Your role involves managing...
NTT DATA
Santa Clara, CA
2 days ago
AI Engineer
...neuron™ , a unified AI-native platform for data... ...seeking a motivated AI/ML Engineer to design, build, and... ...about evaluation, benchmarking, and system reliability... ...You Will Work On ~ Multi-agent orchestration systems... ..., DevOps, or distributed systems Benefits...
Teserac, Inc.
Sunnyvale, CA
6 days ago
Senior Data Center Performance Engineer - Benchmarking and Optimization
$184k - $287.5k
...unlimited potential of AI to define the next... ...An era in which our GPU acts as the brains... ...the way up to large multi-node NVLink domain rack... ...a highly motivated engineer to lead performance benchmarking and optimization... ...communications (NCCL), distributed training and inference...
Remote work
NVIDIA
Santa Clara, CA
3 days ago
Staff AI/ML Fullstack Engineer - AV ML Infra
...About the team The AV ML Infra team builds end‑to‑end ML platforms... ...developer‑facing products to support AI and ML innovation across teams... ...As a Staff AI/ML Full‑Stack Engineer, you will design and build end... ...implement, and test scalable distributed computing solutions. Project...
Israelvcforum
Sunnyvale, CA
2 days ago
Distributed Software Engineer
...Distributed Software Engineer Bengaluru, Karnataka, India; Sunnyvale CA or Toronto... ...builds the world's largest AI chip, 56 times larger than GPUs... ...OpenAI recently announced a multi-year partnership with... ..., over 10 times faster than GPU-based hyperscale cloud inference...
CEREBRAS SYSTEMS INC.
Sunnyvale, CA
4 days ago
Remote: AI Benchmarking & Telemetry Engineer for NVIS
$184k - $287.5k
...A leading technology company seeks an AI Benchmarking and Telemetry Engineer in Santa Clara, California. In this role, you will develop benchmarking approaches for HPC and AI tasks, maintain telemetry frameworks, and collaborate with engineering teams to optimize performance...
Remote work
NVIDIA
Santa Clara, CA
2 days ago
Senior AI Compute Engineer - NVIS
$148k - $235.75k
...NVIDIA is looking for a Senior AI Compute Engineer to join its Infrastructure... ...and ability to prioritize/multi-task easily with limited supervision... ...time. ~ Experience with benchmarking tools such as HPL, NCCL... ...experience. Experience with GPU (Graphics Processing Unit)...
Remote work
NVIDIA
Santa Clara, CA
12 days ago
Senior GPU AI Open-Source Software Engineer
...Advanced Micro Devices is seeking a principal software developer to join the ROCm GPU-compute team in Santa Clara, California. The ideal candidate will have over 10 years of software development experience in C/C++, Python, and GPU technologies. This role involves developing...
Advanced Micro Devices , Inc.
Santa Clara, CA
2 days ago
AI Inference Performance Engineer Scale LLMs & GPU Clusters
$124k - $195.5k
...NVIDIA Corporation is seeking an AI Inference Performance Engineer - New College Grad 2026 in Santa Clara. This role involves optimizing AI inference benchmarks using NVIDIA’s accelerators and working with various teams on performance enhancements. Applicants should have...
NVIDIA
Santa Clara, CA
2 days ago

Do you want to receive more vacancies?

Subscribe and receive similar vacancies to Distributed AI Infra Engineer — Multi-GPU Benchmarking. Be the first to apply!