AI Inference Performance Engineer - New College Grad 2026
$124k - $195.5kNVIDIA Corporation
- # AI Inference Performance Engineer - New College Grad 2026Applylocations: US, CA, Santa Claratime type: Full timeposted on: Posted Yesterdayjob requisition id: JR2014441We optimize and benchmark GenAI inference on NVIDIA's latest accelerators, defining the industry’s performance standards across language models, video generation, and speech workloads. We work directly within TensorRT-LLM, SGLang, and vLLM, building the tools that evaluate serving performance at scale. This team sits at the intersection of GPU performance engineering and public accountability.**What You Will Be Doing:*** Drive industry benchmark results: own the end-to-end optimization pipeline, implement and integrate optimizations in quantization, scheduling, memory management, and distributed inference across TensorRT-LLM, SGLang, and vLLM.* Define and optimize cutting-edge workloads: identify and shape next-generation inference benchmarks, multi-turn coding, agentic workflows, and other emerging AI use cases. Collaborate with framework and kernel teams to push performance to its extreme on large-scale LLM-MoE models, vision-language models, video diffusion models, recommendation, and speech workloads.* Architect distributed inference: Design and optimize execution from single-GPU to rack-scale clusters, managing performance across clusters of GPUs.* Establish performance methodology: Apply roofline analysis and systematic profiling to decompose bottlenecks across CUDA kernels, frameworks, and serving layers.* Influence the ecosystem: contribute to TensorRT-LLM, vLLM, SGLang, and other open-source projects. Partner with architecture, kernel, and compiler teams to shape GPU roadmaps based on real workload data.* Technical Leadership: Raise the technical bar for the team, drive cross-functional execution on tight benchmark timelines, and lead a world-class team.**What We Need To See:*** BS, MS, or PhD in Computer Science, Computer Engineering, Electrical Engineering, or equivalent experience.* 2+ years of relevant software development experience.* Strong Python or C++ programming, software design, and software engineering skills.* Expertise with a DL framework such as PyTorch or JAX.* Proven track record of delivering measurable performance improvements in deep learning inference or high-performance systems.* Deep understanding of LLM/VLM architectures and inference mechanics: attention, KV caching, batching strategies, decode-phase bottlenecks, speculative decoding, disaggregated serving etc.**Ways To Stand Out From The Crowd:*** Prior experience with an LLM framework (TensorRT-LLM, vLLM, SGLang, etc) or a DL compiler in inference, deployment, algorithms, or implementation.* Prior experience with performance modeling, profiling, debug, and code optimization of a DL/HPC/high-performance application.* Experience with scale-out inference orchestration (MPI, NCCL, K8S) on large GPU clusters.* Expertise in kernel development (CUTLASS, cuteDSL, tilelang, OpenAI Triton) or compiler/runtime paths (torch.compile, graph lowering, operator fusion). Architectural knowledge of CPU, GPU, FPGA or other DL accelerators; GPU programming experience (CUDA).* Track record of leading ambiguous, high-impact technical programs across multiple teams under tight deadlines.GPU deep learning has provided the foundation for machines to learn, perceive, reason and solve problems posed using human language. The GPU started out as the engine for simulating human imagination, conjuring up the outstanding virtual worlds of video games and Hollywood films. Now, NVIDIA's GPU runs deep learning algorithms, simulating human intelligence, and acts as the brain of computers, robots and self-driving cars that can perceive and understand the world. Just as human imagination and intelligence are linked, computer graphics and artificial intelligence come together in our architecture. Two modes of the human brain, two modes of the GPU. This may explain why NVIDIA GPUs are used broadly for deep learning, and NVIDIA is increasingly known as “the AI computing company.” Come, join our DL Architecture team, where you can help build the real-time, cost-effective computing platform driving our success in this exciting and quickly growing field.Your base salary will be determined based on your location, experience, and the pay of employees in similar positions. The base salary range is 124,000 USD - 195,500 USD for Level 2, and 152,000 USD - 241,500 USD for Level 3.You will also be eligible for equity and benefits.Applications for this job will be accepted at least until June 7, 2026.This posting is for an existing vacancy.NVIDIA uses AI tools in its recruiting processes.NVIDIA is committed to fostering an inclusive work environment and proud to be an equal opportunity employer. As we highly value diversity in our current and future employees, we do not discriminate (including in our hiring and promotion practices) on the basis of race, religion, color, national origin, gender, gender expression, sexual orientation, age, marital status, veteran status, disability status or any other characteristic protected by law.
- J-18808-Ljbffr NVIDIA Corporation
Vacancy posted 4 days ago
Similar jobs that could be interesting for youBased on the AI Inference Performance Engineer - New College Grad 2026 in Santa Clara, CA vacancy
$124k - $195.5k
## Systems Performance Engineer, Agentic AI Workloads – New College Grad 2026Applylocations: US, CA, Santa Clara: US, NC, Durham... ...fundamentals, LLMs, and modern inference serving frameworks**Ways to... ...accepted at least until June 7, 2026.This posting is for an existing...New gradPerformance$124k - $195.5k
...model focused on visual and AI computing. For two decades... ...AI Developer Technology Engineer to push the limits of performance at the intersection of AI,... ...related field. Experience with inference optimization techniques... ...at least until April 13, 2026. This posting is for an...New gradPerformanceInternship$124k - $195.5k
...NVIDIA Corporation is seeking an AI Inference Performance Engineer - New College Grad 2026 in Santa Clara. This role involves optimizing AI inference benchmarks using NVIDIA’s accelerators and working with various teams on performance enhancements. Applicants should have...New gradPerformance$100k - $166.75k
...creative people to help us explore new GPU application opportunities... ...space. What you’ll be doing: Perform system design reviews for data... ...‑performance computing and AI workloads to drive platform configuration... ...BS, MS, or PhD in Electrical Engineering, Computer Engineering, Systems...New gradPerformance$168k - $264.5k
...unlimited potential of AI to define the next era... ...the world. We, the Human Performance and Experience (HPX)... ...research scientists and engineers from varied backgrounds... ...multi-GPU training and inference workflows. Familiarity... ...least until January 25, 2026. This posting is for an...New gradPerformance$124k - $195.5k
...Learning Kernel Software Performance Architect - New College Grad 2026 page is loaded## Deep Learning... ...unlimited potential of AI to define the next era of... ...AI/ML training and inference performance teams to identify... ...Science, Electrical Engineering or Computer Engineering,...New gradPerformanceWork experience placement$116k - $189.75k
...We are seeking an innovative Timing Methodology Engineer to help drive multi-physics sign-off strategies for the world's leading GPUs and SoCs. This position is a broad opportunity to optimize performance, yield, and reliability through increasingly comprehensive modeling...New gradPerformance$124k - $195.5k
...motivated Compiler Software Engineer to join this dynamic... ...and deployment of high performance parallel applications... ...factoring in support for new GPU hardware... ...at least until May 30, 2026. This posting is for an... ...existing vacancy. NVIDIA uses AI tools in its recruiting...New gradPerformance$124k - $195.5k
...Deep Learning Software Engineer, TensorRT Performance! NVIDIA is seeking an... ...of NVIDIA’s inference ecosystem! NVIDIA is... ...areas like Generative AI, Recommenders and Vision... ...TensorRT. Develop new model pipelines for NVIDIA... ...least until April 7, 2026. This posting is...New gradPerformanceRemote work$124k - $195.5k
...Applications and Compiler Engineer, LPX - New College Grad 2026 page is loaded## Machine... ...model focused on visual and AI computing. For two decades... ...optimizations for our LPX inference and compiler stack. You... ...develop, and maintain high-performance runtime and compiler components...New gradPerformance$116k - $189.75k
...Signal and Power Integrity Engineer - New College Grad 2026Applylocations: US, CA,... ...deep learning ignited modern AI — the next era of computing... ...simulations of high-performance AI systems, graphic cards,... ...accepted at least until May 16, 2026.This posting is for an existing...New gradPerformance$100k - $166.75k
...implement, and deliver fully verified and high‑performance RTL to achieve design targets. What we... ...see Bachelors or Masters in Electrical Engineering or Computer Engineering (or equivalent... ...job will be accepted until April 5, 2026. Equal Opportunity Employer NVIDIA is committed...New gradPerformance- ...Performance Architect – Deep Learning Software We are seeking a Performance... ...: work with the CUDA and AI Compiler teams to pinpoint... ...issues; engage AI/ML training and inference performance teams to identify... ...Computer Science, Electrical Engineering, Computer Engineering, or...New gradPerformanceWork experience placement
$108k - $178.25k
...Scheduling Architect - New College Grad Today, NVIDIA is... ...unlimited potential of AI to define the next era... ...Create functional and performance models, often in C++,... ...Science, or Electrical Engineering (or related degree) or... ...at least until May 14, 2026. This posting is for...New gradPerformance$124k - $195.5k
...characterization of the latest LLMs and inference servers such as vLLM, SGLang... .... Join forces with the performance marketing team to build... .... Collaborate with engineers from AI startup companies to establish... ...advancements in the field. Verify that new GPU product launches produce...New gradPerformance$124k - $195.5k
...you enjoy researching new algorithms and memory management... ..., and optimizing performance of data intensive... ...learning ignited modern AI — the next era of computing... ...Science, Computer Engineering, or related computationally... ...least until April 28, 2026. This posting is for...New gradPerformance$116k - $189.75k
...team is seeking highly motivated Engineers to help in the development and integration of AI capabilities into verification... ...ensuring their seamless and efficient performance. If you’re passionate about the... ...at least until April 4, 2026. This posting is for an existing...New gradPerformance$100k - $166.75k
As a Formal Verification Engineer at NVIDIA, you will verify the design... ...Drive tools to realize their best performance. Debug RTL to identify causes... ...at least until June 12, 2026. This posting is for an existing vacancy. NVIDIA uses AI tools in its recruiting processes...New gradPerformanceFull time$124k - $195.5k
...Learning Architect, LLM Inference NVIDIA is at the forefront of the generative AI revolution. The... ...focuses on inference server performance optimization for Large... ...achievements. Collaborate with engineers from AI startup... ...the field. Verify that new GPU product launches produce...New gradPerformance- ...for a highly motivated Compiler Software Engineer to join this dynamic and innovative CUDA... ...development and deployment of high performance parallel applications on NVIDIA's next generation... ...its evolution, factoring in support for new GPU hardware capabilities. Drive...New gradPerformance
$100k - $166.75k
## ASIC Clocks Design Engineer - New College Grad 2026Applylocations: US, CA, Santa... ...learning ignited modern AI — the next era of computing... ...constraints.* Improve Power, Performance, and Area (PPA) of innovative... ...at least until June 7, 2026.This posting is for an existing...New gradPerformance$168k - $264.5k
...Research Scientist in Generative AI for Graphics and Gaming!... ...quality, robustness, performance, and latency, and enable new interactive experiences such... ...synthetic data, in‑engine captures, and real‑world content... ..., low‑precision training/inference) is a plus. Background in...New gradPerformance- ...the world's largest AI chip, 56 times larger... ...leading training and inference speeds and empowers machine... ...The Role As a New Graduate Software Engineer, you will collaborate... ...that directly impact performance, scalability,... ...discipline (graduating in 2026). Proficiency in...New gradPerformanceInternship
$116k - $189.75k
...an innovative EDA Software R&D Engineer with particular interest in... ...mix of graph-based algorithms, AI, and feedback from RTL and physical... ...be doing: Invent and develop new algorithms for RTL synthesis,... ...into production. Explore high performance algorithms for clustering, min...New gradPerformance$152k - $241.5k
...learning ignited modern AI — the next era of... ...seeking top‑tier AI Compiler Engineers to drive innovation... ...what is possible in AI performance and help build the technology... ...for AI workloads (both inference and training) and... ...least until April 28, 2026. NVIDIA uses AI tools...Performance$124k - $195.5k
...work on cutting‑edge AI technology? Join NVIDIA... ...TensorRT team as a Software Engineer, and be at the... ...contributing to high‑performance AI inference solutions for specialized... ...understand and leverage new technologies to... ...accepted until June 2, 2026 EEO Statement NVIDIA...New gradPerformanceInternship$152k - $241.5k
...learning ignited modern AI — the next era of... ...Learning Compiler Engineer. NVIDIA is hiring... ...backbone of NVIDIA’s inference engine, spanning... ...leading inference performance, fast build time, reduced... ...Track record on new hardware bring-up... ...until February 28, 2026. This posting is...Performance- ...NVIDIA Corporation in Santa Clara is seeking a Low Power ASIC Engineer - New College Grad 2026. This role involves collaborating with architecture, design, and software teams while developing test infrastructures to verify NVIDIA products' power management solutions. Candidates...New grad
$100k - $166.75k
...is tapping into the unlimited potential of AI to define the next era of computing. An... ...seeking an outstanding ASIC Verification Engineer to verify the design and implementation of... ...will be accepted at least until June 13, 2026. This posting is for an existing vacancy....New gradFull time$168k - $264.5k
...the ability to scale process to increase performance and reduce power, we must rely more and... ...will be accepted at least until April18,2026. What you’ll be doing: Explore circuit... ...or equivalent experience in Electrical Engineering, Computer Science/Engineering, or related...New gradPerformance
Do you want to receive more vacancies?
Subscribe and receive similar vacancies to AI Inference Performance Engineer - New College Grad 2026. Be the first to apply!
Related searches
- senior ai engineer Santa Clara, CA
- ai ml engineer Santa Clara, CA
- ai engineer remote Santa Clara, CA
- ai engineer Santa Clara, CA
- ai prompt engineer Santa Clara, CA
- ai developer Santa Clara, CA
- machine learning ai engineer Santa Clara, CA
- senior performance engineer Santa Clara, CA
- application performance engineer Santa Clara, CA
- performance engineer Santa Clara, CA

