AI Inference Performance Engineer
$152k - $241.5kNVIDIA Gruppe
We optimize and benchmark GenAI inference on NVIDIA's latest accelerators, defining the industry’s performance standards across language models, video generation, and speech workloads. We work directly within TensorRT-LLM, SGLang, and vLLM, building the tools that evaluate serving performance at scale. This team sits at the intersection of GPU performance engineering and public accountability. What You Will Be Doing: Drive industry benchmark results: own the end-to-end optimization pipeline, implement and integrate optimizations in quantization, scheduling, memory management, and distributed inference across TensorRT-LLM, SGLang, and vLLM. Define and optimize cutting-edge workloads: identify and shape next-generation inference benchmarks, multi-turn coding, agentic workflows, and other emerging AI use cases. Collaborate with framework and kernel teams to push performance to its extreme on large-scale LLM-MoE models, vision-language models, video diffusion models, recommendation, and speech workloads. Architect distributed inference: Design and optimize execution from single-GPU to rack-scale clusters, managing performance across clusters of GPUs. Establish performance methodology: Apply roofline analysis and systematic profiling to decompose bottlenecks across CUDA kernels, frameworks, and serving layers. Influence the ecosystem: contribute to TensorRT-LLM, vLLM, SGLang, and other open-source projects. Partner with architecture, kernel, and compiler teams to shape GPU roadmaps based on real workload data. Technical Leadership: Raise the technical bar for the team, drive cross-functional execution on tight benchmark timelines, and lead a world-class team. What We Need To See: BS, MS, or PhD in Computer Science, Computer Engineering, Electrical Engineering, or equivalent experience. 5+ years of relevant software development experience. Strong Python or C++ programming, software design, and software engineering skills. Expertise with a DL framework such as PyTorch or JAX. Proven track record of delivering measurable performance improvements in deep learning inference or high-performance systems. Deep understanding of LLM/VLM architectures and inference mechanics: attention, KV caching, batching strategies, decode-phase bottlenecks, speculative decoding, disaggregated serving etc. Ways To Stand Out From The Crowd: Prior experience with an LLM framework (TensorRT-LLM, vLLM, SGLang, etc) or a DL compiler in inference, deployment, algorithms, or implementation. Prior experience with performance modeling, profiling, debug, and code optimization of a DL/HPC/high-performance application. Experience with scale-out inference orchestration (MPI, NCCL, K8S) on large GPU clusters. Expertise in kernel development (CUTLASS, cuteDSL, tilelang, OpenAI Triton) or compiler/runtime paths (torch.compile, graph lowering, operator fusion). Architectural knowledge of CPU, GPU, FPGA or other DL accelerators; GPU programming experience (CUDA). Track record of leading ambiguous, high-impact technical programs across multiple teams under tight deadlines. Your base salary will be determined based on your location, experience, and the pay of employees in similar positions. The base salary range is 152,000 USD - 241,500 USD. You will also be eligible for equity and benefits. NVIDIA is committed to fostering a diverse work environment and proud to be an equal opportunity employer. As we highly value diversity in our current and future employees, we do not discriminate (including in our hiring and promotion practices) on the basis of race, religion, color, national origin, gender, gender expression, sexual orientation, age, marital status, veteran status, disability status or any other characteristic protected by law. #J-18808-Ljbffr NVIDIA Gruppe
$152k - $241.5k
...deep learning ignited modern AI — the next era of computing —... ...seeking top-tier AI Compiler Engineers to drive innovation within our... ...of what is possible in AI performance and help build the technology... ...problems for AI workloads (both inference and training) and...Performance$152k - $241.5k
...deep learning ignited modern AI — the next era of computing —... ...AI & Deep Learning Compiler Engineer. NVIDIA is hiring software engineers... ...the backbone of NVIDIA’s inference engine, spanning across data... ...deliver leading inference performance, fast build time, reduced memory...Performance$184k - $287.5k
...We are seeking highly skilled and motivated software engineers to join us and build AI inference systems that serve large-scale models with extreme efficiency. You’ll architect and implement high-performance inference stacks, optimize GPU kernels and compilers, drive...Performance- ...generation computing experiences-from AI and data centers, to PCs,... ...for a strategic software engineering lead who is passionate about improving the performance of key applications and benchmarks... ...optimizing scale-up and scale-out inference. Develop methods and tooling...Performance
$152k - $241.5k
...learning and eager to work on cutting-edge AI technology for safety-critical applications?... ...NVIDIA's TensorRT team as a Senior Software Engineer, and be at the forefront of technology, enabling high-performance AI inference solutions for automotive safety and other specialized...Performance- ...generation computing experiences-from AI and data centers, to PCs, gaming... ...for a Senior Staff AI Infra Engineer who is passionate about improving the performance of key applications and benchmarks... ...and accelerate LLM training and inference on AMD GPUs, improving kernel, communication...Performance
$184k - $356.5k
NVIDIA Gruppe is looking for skilled software engineers to develop AI inference systems that operate with high efficiency. The role involves architecting high-performance inference frameworks and optimizing GPU processes. Ideal candidates should have extensive programming...Performance$152k - $241.5k
NVIDIA is hiring an AI & Deep Learning Compiler Engineer for its Deep Learning & AI Compiler (DLC) team. What you’ll be doing Analyzing deep learning... ...of deep learning software. Defining public APIs, performance optimizations and analysis, and crafting and implementing...Performance$150k - $275k
...cutting-edge tech company in San Jose is seeking a Supercomputing Engineer to ensure the reliability of its inference servers. This role involves designing and executing test suites, analyzing performance, and collaborating with engineering teams. Ideal candidates will...Performance$152k - $287.5k
...seeking a Senior Machine Learning Applications and Compiler Engineer in Santa Clara, California. This role involves developing algorithms for their LPX inference and compiler stack, optimizing the performance of neural network workloads on NVIDIA platforms. Ideal candidates...Performance$135.8k - $237.05k
...Mountain View, CA, USA Senior Backend Engineer, ML Inference Systems Location Mountain View, CA, USA Department AI & Machine Learning Requisition ID JOBREQ... ...of daily decisions, with a focus on the performance, reliability, and scalability of inference...PerformanceWork at officeWorldwideRelocation package$120.1k - $225.7k
...Role Entails End-to-End Inference Optimization: Lead the... ...: Design and implement high-performance inference frameworks; optimize... ...team members to build a robust AI inference technical ecosystem... ...Computer Science, Electronic Engineering, AI, or related fields; significant...PerformanceRelocation package$207k - $300k
Senior Research Engineer, On-Device Inference, Robotics, DeepMind corporate_fare DeepMind place Mountain... ...PyTorch, particularly focused on high-performance inference. Understanding of techniques to align model architectures with AI accelerators (e.g., distillation)....PerformanceFull time- ...California seeks an ML Infrastructure Engineer to build and operate inference systems for their automation stack.... ...for model inference, optimizing performance, and collaborating with research teams... ...to make a significant impact in robotics and AI. #J-18808-Ljbffr Rhoda AIPerformance
- NVIDIA is looking for a Deep Learning Software Engineer to analyze and optimize the performance of our inference ecosystem. This role involves developing benchmarking... ...well as working cross-functionally across various AI domains. The ideal candidate will have a relevant...Performance
- ...seeking a Senior DL Algorithms Engineer to optimize LLM/Omni models and enhance performance across its software stack. The ideal... ...deep learning, specifically in inference. This role involves profiling,... ...collaborating with teams to advance AI solutions. A strong...Performance
$120k - $180k
Application Engineer - Low Power Edge Inference (DIB Focus) About this Role We are seeking an Application Engineer... ...the SoC Profile and improve system performance (latency, energy per inference,... ...in deploying cutting‑edge edge AI silicon into real‑world, resource‑constrained...PerformanceFor contractorsInternship- 1600 NIO USA, Inc. is looking for a Senior AI Inference Infrastructure Software Engineer in San Jose, California. The role involves designing and optimizing high-performance inference systems for Large Language and Vision-Language Models across various environments. Candidates...PerformanceFlexible hours
- NVIDIA Gruppe is looking for a skilled engineer to join their TensorRT Edge-LLM team in Santa... ...role involves developing a state-of-the-art inference framework for large language models and optimizing it for real-time performance on embedded platforms. Candidates should...Performance
- NVIDIA Gruppe in Santa Clara, California is seeking a Senior Software Engineer specializing in Deep Learning Inference. In this role, you will craft and develop high-performance software tailored for scalable platforms while collaborating with experts in the field. The...Performance
$152k - $241.5k
...looking for a Senior DL Algorithms Engineer for LLM/Omni model... ...engineers who are mindful of performance analysis and optimization to... ...technology company that leads the AI revolution. What you will be... ...Cosmos) on NVIDIA’s accelerated inference SW stack. Contribute new...Performance$160k - $282k
...What to Expect The AI inferencing Hardware team is responsible for designing the... ...house AI silicon, to deliver inferencing performance while meeting efficiency, cost, reliability... ...'ll Bring ~ Degree in Electrical Engineering or equivalent experience ~3+ years of...PerformanceHourly payFull timeTemporary workFlexible hours- Senior / Principal Machine Learning Engineer - Inference Serving Frameworks Full-time | On-site | Bay... ...-mode startup building rack-level AI inference systems. Our differentiated system... ...and software experts to architect high‑performance inference stacks and design resource...PerformanceFull time
$124k - $195.5k
...working at the cutting edge of AI infrastructure. As agentic... ...modern datacenters, we need engineers who can model, simulate, and... ...scale. If you have a passion for performance analysis, a strong... ...fundamentals, LLMs, and modern inference serving frameworks Ways...Performance$320k
...and lead execution for agentic AI systems for the CUDA ecosystem,... ...and measurable success metrics (performance, quality, reliability, developer... ...frameworks, distributed training, and inference/serving—and with model and research/engineering teams. Scale impact through...Performance- ...Systems builds the world's largest AI chip, 56 times larger than... ...‑leading training and inference speeds and empowers machine learning... ...Role We are hiring a Senior Performance Analyst to join our Product... ...Collaborate with Product and Engineering to identify where competitors...PerformanceContract workShift work
$224k - $356.5k
...into the unlimited potential of AI to define the next era of... ...at the forefront of AI and high-performance computing. As a Senior / Principal Deep Learning Engineer — Model Evaluation & AI Systems... ...Work alongside model training, inference, and product divisions to provide...Performance$184k - $287.5k
...for a Senior Systems Software Engineer for our Robotics Team working... ...for the next wave of AI-powered physical agents is a... ...simulation, training, and edge inference. NVIDIA’s ISAAC platform binds... ...software for functionality and performance Collaborate with other teams...Performance$272k - $431.25k
...this transition flawless, high-performing, and secure for millions of... ...We are looking for a Principal Engineer to serve as a key technical leader in deploying advanced AI agent frameworks and local runtimes... .... By combining powerful local inference (Nemotron models) with robust...PerformanceLocal areaWorldwide$255.85k - $361.2k
...Overview We are seeking a Principal Engineer to define and architect the... ...generation of distributed AI systems across heterogeneous... ...state, locality, and performance at system scale. Key Responsibilities... ...with AI/ML systems, inference infrastructure, or large‑scale...PerformanceLocal areaShift work
Do you want to receive more vacancies?
Subscribe and receive similar vacancies to AI Inference Performance Engineer. Be the first to apply!
- machine learning ai engineer Santa Clara, CA
- ai engineer remote Santa Clara, CA
- ai prompt engineer Santa Clara, CA
- ai developer Santa Clara, CA
- ai engineer Santa Clara, CA
- ai ml engineer Santa Clara, CA
- senior ai engineer Santa Clara, CA
- acting performance Santa Clara, CA
- lead performance test engineer Santa Clara, CA
- performance engineer Santa Clara, CA

