AI Inference Performance Engineer Scale LLMs & GPU Clusters
$124k - $195.5kNVIDIA
NVIDIA Corporation is seeking an AI Inference Performance Engineer - New College Grad 2026 in Santa Clara. This role involves optimizing AI inference benchmarks using NVIDIA’s accelerators and working with various teams on performance enhancements. Applicants should have a solid background in software engineering and deep learning frameworks. The position offers a competitive salary and benefits, with a range between 124,000 USD - 195,500 USD for Level 2. Join a pioneering team that is shaping the AI landscape. #J-18808-Ljbffr
$184k - $356.5k
...looking for a Senior Software Engineer specializing in Deep Learning Inference in Santa Clara, California... ...will design and optimize GPU-accelerated software critical for advanced AI applications, contributing... .... The role includes performance optimization and collaboration...Performance- ...experiences-from AI and data... ...Staff AI Infra Engineer who is... ...improving the performance of key applications... ...and GPU-accelerated computing... ...Language Models (LLMs) and Agentic... ...training and inference on AMD GPUs,... ...on GPU clusters, including large-scale training and...Performance
$184k - $287.5k
NVIDIA is seeking a Senior Systems Software Engineer focusing on GPU Performance at Scale. This role involves driving innovation in AI and GPU computing, collaborating with developers and researchers to enhance system workflows. Key duties include leading performance practices...Performance- NVIDIA Gruppe is seeking a Software Performance at Scale Intern in Santa Clara, CA,... ...opportunity to work with a leading engineering team focused on AI and computing. The role includes collaborating... ...optimizations in large GPU clusters. The ideal intern is currently enrolled...PerformanceInternship
$184k - $287.5k
...and motivated software engineers to join us and build AI inference systems that serve large-scale models with extreme efficiency... ...and implement high-performance inference stacks, optimize GPU kernels and compilers,... ...deployments on GPU clusters across clouds. Conduct...Performance$160k - $322k
...Santa Clara is seeking a Senior Technical Marketing Engineer focused on GPUs and scale-up architecture. The role involves showcasing NVIDIA's GPU architecture and server-level platforms, aiming to maximize performance for AI applications. The ideal candidate will have at...Performance$184k - $356.5k
NVIDIA Gruppe is looking for skilled software engineers to develop AI inference systems that operate with high efficiency. The role involves architecting high-performance inference frameworks and optimizing GPU processes. Ideal candidates should have extensive programming...Performance$152k - $241.5k
...invention of the GPU 1999 sparked the growth... ...ignited modern AI — the next era of... ...-tier AI Compiler Engineers to drive innovation... ...is possible in AI performance and help build the... ...on a global scale. What you’ll be... ...AI workloads (both inference and training) and...Performance$124k - $195.5k
...cutting edge of AI infrastructure... ..., we need engineers who can model,... ...level traffic at scale. If you have a passion for performance analysis, a... ...datacenter and GPU systems.What you... ..., and clustering techniques such... ...fundamentals, LLMs, and modern inference serving frameworks...Performance$160k - $253k
...transforming into AI factories, and NVIDIA... ...computing is the engine of artificial intelligence... ...integrate high performance compute, networking... ...to power AI at scale. We are looking for... ...showcasing NVIDIA's GPU architecture, server... ...efficiency for AI inference & training. What you...Performance$152k - $241.5k
...motivated Software Engineer to join our growing AI and Generative... ...of large-scale AI systems powering... ...applications in LLMs, agentic AI, retrieval... ...ML training, inference, and generative... ...platforms supporting GPU clusters, fault‑tolerant... ..., and high‑performance AI workloads. Develop...Performance- ...Department: Backend Engineer · Work type: On-... ...About A rchetype AI Archetype AI is developing... ...for building performant, scalable, and... ...into production—at scale, with reliability,... ...-latency AI model inference and data services.... ...performance across GPU clusters, cloud infrastructure...PerformanceFull time
$131k - $175k
...Senior Hardware Systems Engineer – AI Rack & Cluster Infrastructure Arista Networks... ...standards of quality and performance in everything we do. Job... ...engineers, to deliver rack-scale solutions for the world's... ...cooling into high-density GPU environments, ensuring performance...PerformanceRemote workFlexible hours$207k - $300k
Senior Research Engineer, On-Device Inference, Robotics, DeepMind... ...Language Models (LLMs), including... ...focused on high-performance inference. Understanding... ...with AI accelerators (e.g... ...them to setup large-scale tests and deploy... ...techniques across GPU, TPU, and CPU architectures...PerformanceFull time- ...NVIDIA Gruppe seeks a skilled HPC Cluster Engineer to design, deploy, and operate GPU Compute Clusters for high-performance computing workloads. This role involves collaboration with various teams to ensure effective and reliable cluster performance. Key responsibilities...Performance
$272k - $431.25k
...Principal Rack Scale Systems Infrastructure Engineer NVIDIA has been transforming... ...potential of AI to define the next era... ...An era in which our GPU acts as the brains... ...silicon, or other high-performance computing systems.... ...experience with rack- or cluster-scale systems...PerformanceShift work- ...experiences-from AI and data centers... ...Quality Engineer to serve as the... ...on AMD Instinct™ GPU platforms. You will... ...framework, workload, performance, stress, stability, scale-out, and system-... ...training and inference (PyTorch, vLLM,... ...and large-scale cluster software ~ System...PerformanceContract workShift work
$152k - $241.5k
...NVIDIA's invention of the GPU 1999 sparked the growth... ...ignited modern AI — the next era of computing... ...Deep Learning Compiler Engineer. NVIDIA is hiring software... ...backbone of NVIDIA’s inference engine, spanning across... ...deliver leading inference performance, fast build time,...Performance- ...Machine Learning Engineer - Inference Serving Frameworks... ...building rack-level AI inference systems.... ...for data center-scale inference serving.... ...inference serving and cluster scheduling... ...to architect high‑performance inference stacks and... ...‑level debugging. GPU kernel development...PerformanceFull time
- ...located in Santa Clara, CA, is seeking a Senior Systems Software Engineer focused on GPU Performance at Scale. This role entails leading performance practices in large-scale GPU infrastructure and aligning AI workloads with next-generation datacenter builds. The ideal...Performance
- ...seeking a technical leader for the GPU AI/HPC Infrastructure team. You will... ...cutting-edge GPU compute clusters, focusing on deep learning and high-performance computing. The ideal candidate will... ...+ years of experience with large-scale infrastructure, strong programming...Performance
$224k - $356.5k
...unlimited potential of AI to define the... ...in which our GPU acts as the... ...of AI and high-performance computing. As a... ...Deep Learning Engineer — Model Evaluation... ..., including LLMs, RAG systems, agents... ...on large GPU clusters. Collaborate... ...model training, inference, and product divisions...Performance$20 - $71 per hour
NVIDIA is seeking a Software Performance at Scale Intern in Santa Clara, CA. In this role, you will collaborate with engineers to improve software performance across large GPU clusters and analyze workloads to identify optimization opportunities. Candidates should be enrolled...PerformanceHourly payInternship$152k - $241.5k
NVIDIA Gruppe is seeking a talented individual to optimize and benchmark GenAI inference using the latest acceleration technologies. The role involves driving industry benchmark results and architecting distributed inference systems. Required qualifications include a relevant...Performance$200k - $322k
...self‑motivated senior engineer for the Aerial Omniverse... ...will design and implement GPU kernels that apply time... ...and NIC budgets at scale. You will work with the... ...need to see: PhD in high‑performance computing, computer... ...existing vacancy. NVIDIA uses AI tools in its recruiting...Performance$20 - $71 per hour
NVIDIA Corporation is seeking a Software Performance at Scale Intern in Santa Clara, CA. This role involves working with engineering teams to optimize software performance on large GPU clusters. Candidates should be enrolled in a relevant degree program and have strong...PerformanceHourly payInternship- ...computing experiences-from AI and data centers, to PCs, gaming... ...for a strategic software engineering lead who is passionate about improving the performance of key applications and... ...techniques for optimizing scale-up and scale-out inference. Develop methods and tooling...Performance
$120.1k - $225.7k
...End-to-End Inference Optimization: Lead... ...and implement high-performance inference frameworks... ...to build a robust AI inference technical... ...Science, Electronic Engineering, AI, or related fields... ...ultra-large-scale models is highly... ...large-scale inference clusters or driving AI...PerformanceRelocation package- ...potential of generative AI to power the... ...The role: Analog Design Engineer, Senior / Staff /Sr. Staff... ...Artificial Intelligence Inference Accelerator and High-Speed... ...Die-2-Die Interface for scale-out. Job scope includes... ...circuit design, system level performance analysis, design test...Performance3 days per week
$152k - $241.5k
...for inventing the GPU and driving breakthroughs... ...graphics, high-performance computing, and... ...everything from generative AI to autonomous... ...MARS), builds and scales the infrastructure... ...researchers and engineers to develop the... ...groundbreaking GPU compute clusters that run demanding...PerformanceWork experience placement
Do you want to receive more vacancies?
Subscribe and receive similar vacancies to AI Inference Performance Engineer Scale LLMs & GPU Clusters. Be the first to apply!
- senior ai engineer Santa Clara, CA
- ai ml engineer Santa Clara, CA
- ai engineer remote Santa Clara, CA
- ai engineer Santa Clara, CA
- ai prompt engineer Santa Clara, CA
- ai developer Santa Clara, CA
- machine learning ai engineer Santa Clara, CA
- senior performance engineer Santa Clara, CA
- application performance engineer Santa Clara, CA
- performance engineer Santa Clara, CA

