AI Inference Performance Engineer Scale LLMs & GPU Clusters

$124k - $195.5k

NVIDIA

NVIDIA Corporation is seeking an AI Inference Performance Engineer - New College Grad 2026 in Santa Clara. This role involves optimizing AI inference benchmarks using NVIDIA’s accelerators and working with various teams on performance enhancements. Applicants should have a solid background in software engineering and deep learning frameworks. The position offers a competitive salary and benefits, with a range between 124,000 USD - 195,500 USD for Level 2. Join a pioneering team that is shaping the AI landscape. #J-18808-Ljbffr

Apply

Vacancy posted 12 hours ago

Similar jobs that could be interesting for youBased on the AI Inference Performance Engineer Scale LLMs & GPU Clusters in Santa Clara, CA vacancy

Senior DL Inference Engineer - GPU-Accelerated AI, Equity
$184k - $356.5k
...looking for a Senior Software Engineer specializing in Deep Learning Inference in Santa Clara, California... ...will design and optimize GPU-accelerated software critical for advanced AI applications, contributing... .... The role includes performance optimization and collaboration...
Performance
NVIDIA Gruppe
Santa Clara, CA
13 hours ago
Principal AI Inference Systems Engineer
...experiences-from AI and data... ...Staff AI Infra Engineer who is... ...improving the performance of key applications... ...and GPU-accelerated computing... ...Language Models (LLMs) and Agentic... ...training and inference on AMD GPUs,... ...on GPU clusters, including large-scale training and...
Performance
Advanced Micro Devices , Inc.
Santa Clara, CA
5 days ago
Senior GPU Performance Engineer - Scale & AI Workloads
$184k - $287.5k
NVIDIA is seeking a Senior Systems Software Engineer focusing on GPU Performance at Scale. This role involves driving innovation in AI and GPU computing, collaborating with developers and researchers to enhance system workflows. Key duties include leading performance practices...
Performance
NVIDIA
Santa Clara, CA
4 days ago
Software Performance at Scale Intern: GPU Clusters & AI
NVIDIA Gruppe is seeking a Software Performance at Scale Intern in Santa Clara, CA,... ...opportunity to work with a leading engineering team focused on AI and computing. The role includes collaborating... ...optimizations in large GPU clusters. The ideal intern is currently enrolled...
Performance
Internship
NVIDIA Gruppe
Santa Clara, CA
1 day ago
Senior Software Engineer, AI Inference Systems
$184k - $287.5k
...and motivated software engineers to join us and build AI inference systems that serve large-scale models with extreme efficiency... ...and implement high-performance inference stacks, optimize GPU kernels and compilers,... ...deployments on GPU clusters across clouds. Conduct...
Performance
NVIDIA Gruppe
Santa Clara, CA
13 hours ago
Senior GPU Platform Marketing Engineer AI Scale-Up Systems
$160k - $322k
...Santa Clara is seeking a Senior Technical Marketing Engineer focused on GPUs and scale-up architecture. The role involves showcasing NVIDIA's GPU architecture and server-level platforms, aiming to maximize performance for AI applications. The ideal candidate will have at...
Performance
NVIDIA Gruppe
Santa Clara, CA
12 hours ago
Senior AI Inference Systems Engineer: GPU-Optimized, Cloud
$184k - $356.5k
NVIDIA Gruppe is looking for skilled software engineers to develop AI inference systems that operate with high efficiency. The role involves architecting high-performance inference frameworks and optimizing GPU processes. Ideal candidates should have extensive programming...
Performance
NVIDIA Gruppe
Santa Clara, CA
4 days ago
Compiler Engineer - AI Inference
$152k - $241.5k
...invention of the GPU 1999 sparked the growth... ...ignited modern AI — the next era of... ...-tier AI Compiler Engineers to drive innovation... ...is possible in AI performance and help build the... ...on a global scale. What you’ll be... ...AI workloads (both inference and training) and...
Performance
NVIDIA
Santa Clara, CA
4 days ago
Systems Performance Engineer, Agentic AI Workloads - New College Grad 2026
$124k - $195.5k
...cutting edge of AI infrastructure... ..., we need engineers who can model,... ...level traffic at scale. If you have a passion for performance analysis, a... ...datacenter and GPU systems.What you... ..., and clustering techniques such... ...fundamentals, LLMs, and modern inference serving frameworks...
Performance
NVIDIA
Santa Clara, CA
5 days ago
Senior Technical Marketing Engineer - GPU and System Architecture
$160k - $253k
...transforming into AI factories, and NVIDIA... ...computing is the engine of artificial intelligence... ...integrate high performance compute, networking... ...to power AI at scale. We are looking for... ...showcasing NVIDIA's GPU architecture, server... ...efficiency for AI inference & training. What you...
Performance
NVIDIA Gruppe
Santa Clara, CA
13 hours ago
Senior Software Engineer, Generative AI Systems
$152k - $241.5k
...motivated Software Engineer to join our growing AI and Generative... ...of large-scale AI systems powering... ...applications in LLMs, agentic AI, retrieval... ...ML training, inference, and generative... ...platforms supporting GPU clusters, fault‑tolerant... ..., and high‑performance AI workloads. Develop...
Performance
NVIDIA Gruppe
Santa Clara, CA
12 hours ago
Senior Backend Engineer: Distributed Systems for AI Inference
...Department: Backend Engineer · Work type: On-... ...About A rchetype AI Archetype AI is developing... ...for building performant, scalable, and... ...into production—at scale, with reliability,... ...-latency AI model inference and data services.... ...performance across GPU clusters, cloud infrastructure...
Performance
Full time
Neara
Palo Alto, CA
12 hours ago
Senior Hardware Systems Engineer - AI Rack & Cluster Infrastructure
$131k - $175k
...Senior Hardware Systems Engineer – AI Rack & Cluster Infrastructure Arista Networks... ...standards of quality and performance in everything we do. Job... ...engineers, to deliver rack-scale solutions for the world's... ...cooling into high-density GPU environments, ensuring performance...
Performance
Remote work
Flexible hours
Arista Networks, Inc.
Santa Clara, CA
5 days ago
Senior Research Engineer, On-Device Inference, Robotics, DeepMind
$207k - $300k
Senior Research Engineer, On-Device Inference, Robotics, DeepMind... ...Language Models (LLMs), including... ...focused on high-performance inference. Understanding... ...with AI accelerators (e.g... ...them to setup large-scale tests and deploy... ...techniques across GPU, TPU, and CPU architectures...
Performance
Full time
Google Inc.
Mountain View, CA
2 days ago
Senior GPU HPC Cluster Engineer Equity Eligible
...NVIDIA Gruppe seeks a skilled HPC Cluster Engineer to design, deploy, and operate GPU Compute Clusters for high-performance computing workloads. This role involves collaboration with various teams to ensure effective and reliable cluster performance. Key responsibilities...
Performance
NVIDIA Gruppe
Santa Clara, CA
13 hours ago
Principal Software Engineer - Rack Scale Systems Infrastructure
$272k - $431.25k
...Principal Rack Scale Systems Infrastructure Engineer NVIDIA has been transforming... ...potential of AI to define the next era... ...An era in which our GPU acts as the brains... ...silicon, or other high-performance computing systems.... ...experience with rack- or cluster-scale systems...
Performance
Shift work
NVIDIA
Santa Clara, CA
4 days ago
Principal Software Quality Engineer - GPU & Machine Learning
...experiences-from AI and data centers... ...Quality Engineer to serve as the... ...on AMD Instinct™ GPU platforms. You will... ...framework, workload, performance, stress, stability, scale-out, and system-... ...training and inference (PyTorch, vLLM,... ...and large-scale cluster software ~ System...
Performance
Contract work
Shift work
Advanced Micro Devices , Inc.
San Jose, CA
5 days ago
Senior Compiler Engineer, AI Inference Performance
$152k - $241.5k
...NVIDIA's invention of the GPU 1999 sparked the growth... ...ignited modern AI — the next era of computing... ...Deep Learning Compiler Engineer. NVIDIA is hiring software... ...backbone of NVIDIA’s inference engine, spanning across... ...deliver leading inference performance, fast build time,...
Performance
NVIDIA
Santa Clara, CA
4 days ago
Inference Engineer
...Machine Learning Engineer - Inference Serving Frameworks... ...building rack-level AI inference systems.... ...for data center-scale inference serving.... ...inference serving and cluster scheduling... ...to architect high‑performance inference stacks and... ...‑level debugging. GPU kernel development...
Performance
Full time
Acceler8 Talent
Santa Clara, CA
3 days ago
Senior GPU Performance Systems Engineer at Scale
...located in Santa Clara, CA, is seeking a Senior Systems Software Engineer focused on GPU Performance at Scale. This role entails leading performance practices in large-scale GPU infrastructure and aligning AI workloads with next-generation datacenter builds. The ideal...
Performance
NVIDIA Corporation
Santa Clara, CA
4 days ago
Senior AI/HPC GPU Cluster Architect (Equity)
...seeking a technical leader for the GPU AI/HPC Infrastructure team. You will... ...cutting-edge GPU compute clusters, focusing on deep learning and high-performance computing. The ideal candidate will... ...+ years of experience with large-scale infrastructure, strong programming...
Performance
NVIDIA Gruppe
Santa Clara, CA
1 day ago
Senior Deep Learning Engineer - Model Evaluation & AI Systems
$224k - $356.5k
...unlimited potential of AI to define the... ...in which our GPU acts as the... ...of AI and high-performance computing. As a... ...Deep Learning Engineer — Model Evaluation... ..., including LLMs, RAG systems, agents... ...on large GPU clusters. Collaborate... ...model training, inference, and product divisions...
Performance
NVIDIA Gruppe
Santa Clara, CA
13 hours ago
GPU Software Performance Intern — Scale & AI
$20 - $71 per hour
NVIDIA is seeking a Software Performance at Scale Intern in Santa Clara, CA. In this role, you will collaborate with engineers to improve software performance across large GPU clusters and analyze workloads to identify optimization opportunities. Candidates should be enrolled...
Performance
Hourly pay
Internship
NVIDIA
Santa Clara, CA
12 hours ago
Senior AI Inference Performance Engineer (GPU/Cluster)
$152k - $241.5k
NVIDIA Gruppe is seeking a talented individual to optimize and benchmark GenAI inference using the latest acceleration technologies. The role involves driving industry benchmark results and architecting distributed inference systems. Required qualifications include a relevant...
Performance
NVIDIA Gruppe
Santa Clara, CA
4 days ago
Senior Multi‑GPU Signal Processing and System Architecture Engineer
$200k - $322k
...self‑motivated senior engineer for the Aerial Omniverse... ...will design and implement GPU kernels that apply time... ...and NIC budgets at scale. You will work with the... ...need to see: PhD in high‑performance computing, computer... ...existing vacancy. NVIDIA uses AI tools in its recruiting...
Performance
NVIDIA Gruppe
Santa Clara, CA
2 days ago
Performance at Scale Intern: Optimize GPU Cluster Apps
$20 - $71 per hour
NVIDIA Corporation is seeking a Software Performance at Scale Intern in Santa Clara, CA. This role involves working with engineering teams to optimize software performance on large GPU clusters. Candidates should be enrolled in a relevant degree program and have strong...
Performance
Hourly pay
Internship
NVIDIA Corporation
Santa Clara, CA
4 days ago
Principal Software Engineer (AI Inference / Distributed Systems)
...computing experiences-from AI and data centers, to PCs, gaming... ...for a strategic software engineering lead who is passionate about improving the performance of key applications and... ...techniques for optimizing scale-up and scale-out inference. Develop methods and tooling...
Performance
Advanced Micro Devices , Inc.
Santa Clara, CA
1 day ago
Sr. AI Inference Systems Engineer
$120.1k - $225.7k
...End-to-End Inference Optimization: Lead... ...and implement high-performance inference frameworks... ...to build a robust AI inference technical... ...Science, Electronic Engineering, AI, or related fields... ...ultra-large-scale models is highly... ...large-scale inference clusters or driving AI...
Performance
Relocation package
Tencent
Palo Alto, CA
2 days ago
Senior/Staff Analog IC Design Engineer - AI Inference
...potential of generative AI to power the... ...The role: Analog Design Engineer, Senior / Staff /Sr. Staff... ...Artificial Intelligence Inference Accelerator and High-Speed... ...Die-2-Die Interface for scale-out. Job scope includes... ...circuit design, system level performance analysis, design test...
Performance
3 days per week
d-Matrix
Santa Clara, CA
4 days ago
Senior GPU Supercomputer Scheduler Engineer
$152k - $241.5k
...for inventing the GPU and driving breakthroughs... ...graphics, high-performance computing, and... ...everything from generative AI to autonomous... ...MARS), builds and scales the infrastructure... ...researchers and engineers to develop the... ...groundbreaking GPU compute clusters that run demanding...
Performance
Work experience placement
NVIDIA Gruppe
Santa Clara, CA
4 days ago

Do you want to receive more vacancies?

Subscribe and receive similar vacancies to AI Inference Performance Engineer Scale LLMs & GPU Clusters. Be the first to apply!