Sign up to access all features of our service.
  • Job search
  • Favorites
  • Create a CV
    New
  • Salaries
  • Subscriptions

AI Inference Performance Engineer

$152k - $241.5k

NVIDIA Gruppe

We optimize and benchmark GenAI inference on NVIDIA's latest accelerators, defining the industry’s performance standards across language models, video generation, and speech workloads. We work directly within TensorRT-LLM, SGLang, and vLLM, building the tools that evaluate serving performance at scale. This team sits at the intersection of GPU performance engineering and public accountability. What You Will Be Doing: Drive industry benchmark results: own the end-to-end optimization pipeline, implement and integrate optimizations in quantization, scheduling, memory management, and distributed inference across TensorRT-LLM, SGLang, and vLLM. Define and optimize cutting-edge workloads: identify and shape next-generation inference benchmarks, multi-turn coding, agentic workflows, and other emerging AI use cases. Collaborate with framework and kernel teams to push performance to its extreme on large-scale LLM-MoE models, vision-language models, video diffusion models, recommendation, and speech workloads. Architect distributed inference: Design and optimize execution from single-GPU to rack-scale clusters, managing performance across clusters of GPUs. Establish performance methodology: Apply roofline analysis and systematic profiling to decompose bottlenecks across CUDA kernels, frameworks, and serving layers. Influence the ecosystem: contribute to TensorRT-LLM, vLLM, SGLang, and other open-source projects. Partner with architecture, kernel, and compiler teams to shape GPU roadmaps based on real workload data. Technical Leadership: Raise the technical bar for the team, drive cross-functional execution on tight benchmark timelines, and lead a world-class team. What We Need To See: BS, MS, or PhD in Computer Science, Computer Engineering, Electrical Engineering, or equivalent experience. 5+ years of relevant software development experience. Strong Python or C++ programming, software design, and software engineering skills. Expertise with a DL framework such as PyTorch or JAX. Proven track record of delivering measurable performance improvements in deep learning inference or high-performance systems. Deep understanding of LLM/VLM architectures and inference mechanics: attention, KV caching, batching strategies, decode-phase bottlenecks, speculative decoding, disaggregated serving etc. Ways To Stand Out From The Crowd: Prior experience with an LLM framework (TensorRT-LLM, vLLM, SGLang, etc) or a DL compiler in inference, deployment, algorithms, or implementation. Prior experience with performance modeling, profiling, debug, and code optimization of a DL/HPC/high-performance application. Experience with scale-out inference orchestration (MPI, NCCL, K8S) on large GPU clusters. Expertise in kernel development (CUTLASS, cuteDSL, tilelang, OpenAI Triton) or compiler/runtime paths (torch.compile, graph lowering, operator fusion). Architectural knowledge of CPU, GPU, FPGA or other DL accelerators; GPU programming experience (CUDA). Track record of leading ambiguous, high-impact technical programs across multiple teams under tight deadlines. Your base salary will be determined based on your location, experience, and the pay of employees in similar positions. The base salary range is 152,000 USD - 241,500 USD. You will also be eligible for equity and benefits. NVIDIA is committed to fostering a diverse work environment and proud to be an equal opportunity employer. As we highly value diversity in our current and future employees, we do not discriminate (including in our hiring and promotion practices) on the basis of race, religion, color, national origin, gender, gender expression, sexual orientation, age, marital status, veteran status, disability status or any other characteristic protected by law. #J-18808-Ljbffr NVIDIA Gruppe

Vacancy posted 4 days ago
Similar jobs that could be interesting for youBased on the AI Inference Performance Engineer in Santa Clara, CA vacancy
  • $152k - $241.5k

     ...deep learning ignited modern AI — the next era of computing —...  ...seeking top-tier AI Compiler Engineers to drive innovation within our...  ...of what is possible in AI performance and help build the technology...  ...problems for AI workloads (both inference and training) and... 
    Performance

    NVIDIA

    Santa Clara, CA
    4 days ago
  • $152k - $241.5k

     ...deep learning ignited modern AI — the next era of computing —...  ...AI & Deep Learning Compiler Engineer. NVIDIA is hiring software engineers...  ...the backbone of NVIDIA’s inference engine, spanning across data...  ...deliver leading inference performance, fast build time, reduced memory... 
    Performance

    NVIDIA

    Santa Clara, CA
    4 days ago
  • $184k - $287.5k

     ...We are seeking highly skilled and motivated software engineers to join us and build AI inference systems that serve large-scale models with extreme efficiency. You’ll architect and implement high-performance inference stacks, optimize GPU kernels and compilers, drive... 
    Performance

    NVIDIA

    Santa Clara, CA
    4 days ago
  •  ...generation computing experiences-from AI and data centers, to PCs,...  ...for a strategic software engineering lead who is passionate about improving the performance of key applications and benchmarks...  ...optimizing scale-up and scale-out inference. Develop methods and tooling... 
    Performance

    Advanced Micro Devices , Inc.

    Santa Clara, CA
    1 day ago
  • $152k - $241.5k

     ...learning and eager to work on cutting-edge AI technology for safety-critical applications?...  ...NVIDIA's TensorRT team as a Senior Software Engineer, and be at the forefront of technology, enabling high-performance AI inference solutions for automotive safety and other specialized... 
    Performance

    NVIDIA

    Santa Clara, CA
    1 day ago
  •  ...generation computing experiences-from AI and data centers, to PCs, gaming...  ...for a Senior Staff AI Infra Engineer who is passionate about improving the performance of key applications and benchmarks...  ...and accelerate LLM training and inference on AMD GPUs, improving kernel, communication... 
    Performance

    Advanced Micro Devices , Inc.

    Santa Clara, CA
    19 hours ago
  • $184k - $356.5k

    NVIDIA Gruppe is looking for skilled software engineers to develop AI inference systems that operate with high efficiency. The role involves architecting high-performance inference frameworks and optimizing GPU processes. Ideal candidates should have extensive programming... 
    Performance

    NVIDIA Gruppe

    Santa Clara, CA
    4 days ago
  • $152k - $241.5k

    NVIDIA is hiring an AI & Deep Learning Compiler Engineer for its Deep Learning & AI Compiler (DLC) team. What you’ll be doing Analyzing deep learning...  ...of deep learning software. Defining public APIs, performance optimizations and analysis, and crafting and implementing... 
    Performance

    NVIDIA Gruppe

    Santa Clara, CA
    4 days ago
  • $150k - $275k

     ...cutting-edge tech company in San Jose is seeking a Supercomputing Engineer to ensure the reliability of its inference servers. This role involves designing and executing test suites, analyzing performance, and collaborating with engineering teams. Ideal candidates will... 
    Performance

    Etched

    San Jose, CA
    3 days ago
  • $152k - $287.5k

     ...seeking a Senior Machine Learning Applications and Compiler Engineer in Santa Clara, California. This role involves developing algorithms for their LPX inference and compiler stack, optimizing the performance of neural network workloads on NVIDIA platforms. Ideal candidates... 
    Performance

    NVIDIA Gruppe

    Santa Clara, CA
    4 days ago
  • $135.8k - $237.05k

     ...Mountain View, CA, USA Senior Backend Engineer, ML Inference Systems Location Mountain View, CA, USA Department AI & Machine Learning Requisition ID JOBREQ...  ...of daily decisions, with a focus on the performance, reliability, and scalability of inference... 
    Performance
    Work at office
    Worldwide
    Relocation package

    Unity Technologies

    Mountain View, CA
    1 day ago
  • $120.1k - $225.7k

     ...Role Entails End-to-End Inference Optimization: Lead the...  ...: Design and implement high-performance inference frameworks; optimize...  ...team members to build a robust AI inference technical ecosystem...  ...Computer Science, Electronic Engineering, AI, or related fields; significant... 
    Performance
    Relocation package

    Tencent

    Palo Alto, CA
    2 days ago
  • $207k - $300k

    Senior Research Engineer, On-Device Inference, Robotics, DeepMind corporate_fare DeepMind place Mountain...  ...PyTorch, particularly focused on high-performance inference. Understanding of techniques to align model architectures with AI accelerators (e.g., distillation).... 
    Performance
    Full time

    Google Inc.

    Mountain View, CA
    2 days ago
  •  ...California seeks an ML Infrastructure Engineer to build and operate inference systems for their automation stack....  ...for model inference, optimizing performance, and collaborating with research teams...  ...to make a significant impact in robotics and AI. #J-18808-Ljbffr Rhoda AI
    Performance

    Rhoda AI

    Palo Alto, CA
    3 days ago
  • NVIDIA is looking for a Deep Learning Software Engineer to analyze and optimize the performance of our inference ecosystem. This role involves developing benchmarking...  ...well as working cross-functionally across various AI domains. The ideal candidate will have a relevant... 
    Performance

    NVIDIA

    Santa Clara, CA
    4 days ago
  •  ...seeking a Senior DL Algorithms Engineer to optimize LLM/Omni models and enhance performance across its software stack. The ideal...  ...deep learning, specifically in inference. This role involves profiling,...  ...collaborating with teams to advance AI solutions. A strong... 
    Performance

    NVIDIA

    Santa Clara, CA
    4 days ago
  • $120k - $180k

    Application Engineer - Low Power Edge Inference (DIB Focus) About this Role We are seeking an Application Engineer...  ...the SoC Profile and improve system performance (latency, energy per inference,...  ...in deploying cutting‑edge edge AI silicon into real‑world, resource‑constrained... 
    Performance
    For contractors
    Internship

    TetraMem Inc

    San Jose, CA
    3 days ago
  • 1600 NIO USA, Inc. is looking for a Senior AI Inference Infrastructure Software Engineer in San Jose, California. The role involves designing and optimizing high-performance inference systems for Large Language and Vision-Language Models across various environments. Candidates... 
    Performance
    Flexible hours

    1600 NIO USA, Inc.

    San Jose, CA
    4 days ago
  • NVIDIA Gruppe is looking for a skilled engineer to join their TensorRT Edge-LLM team in Santa...  ...role involves developing a state-of-the-art inference framework for large language models and optimizing it for real-time performance on embedded platforms. Candidates should... 
    Performance

    NVIDIA Gruppe

    Santa Clara, CA
    4 days ago
  • NVIDIA Gruppe in Santa Clara, California is seeking a Senior Software Engineer specializing in Deep Learning Inference. In this role, you will craft and develop high-performance software tailored for scalable platforms while collaborating with experts in the field. The... 
    Performance

    NVIDIA Gruppe

    Santa Clara, CA
    4 days ago
  • $152k - $241.5k

     ...looking for a Senior DL Algorithms Engineer for LLM/Omni model...  ...engineers who are mindful of performance analysis and optimization to...  ...technology company that leads the AI revolution. What you will be...  ...Cosmos) on NVIDIA’s accelerated inference SW stack. Contribute new... 
    Performance

    NVIDIA

    Santa Clara, CA
    4 days ago
  • $160k - $282k

     ...What to Expect The AI inferencing Hardware team is responsible for designing the...  ...house AI silicon, to deliver inferencing performance while meeting efficiency, cost, reliability...  ...'ll Bring ~ Degree in Electrical Engineering or equivalent experience ~3+ years of... 
    Performance
    Hourly pay
    Full time
    Temporary work
    Flexible hours

    Tesla

    Palo Alto, CA
    2 days ago
  • Senior / Principal Machine Learning Engineer - Inference Serving Frameworks Full-time | On-site | Bay...  ...-mode startup building rack-level AI inference systems. Our differentiated system...  ...and software experts to architect high‑performance inference stacks and design resource... 
    Performance
    Full time

    Acceler8 Talent

    Santa Clara, CA
    3 days ago
  • $124k - $195.5k

     ...working at the cutting edge of AI infrastructure. As agentic...  ...modern datacenters, we need engineers who can model, simulate, and...  ...scale. If you have a passion for performance analysis, a strong...  ...fundamentals, LLMs, and modern inference serving frameworks Ways... 
    Performance

    NVIDIA

    Santa Clara, CA
    5 hours ago
  • $320k

     ...and lead execution for agentic AI systems for the CUDA ecosystem,...  ...and measurable success metrics (performance, quality, reliability, developer...  ...frameworks, distributed training, and inference/serving—and with model and research/engineering teams. Scale impact through... 
    Performance

    NVIDIA Gruppe

    Santa Clara, CA
    4 days ago
  •  ...Systems builds the world's largest AI chip, 56 times larger than...  ...‑leading training and inference speeds and empowers machine learning...  ...Role We are hiring a Senior Performance Analyst to join our Product...  ...Collaborate with Product and Engineering to identify where competitors... 
    Performance
    Contract work
    Shift work

    Cerebras

    Sunnyvale, CA
    2 days ago
  • $224k - $356.5k

     ...into the unlimited potential of AI to define the next era of...  ...at the forefront of AI and high-performance computing. As a Senior / Principal Deep Learning Engineer — Model Evaluation & AI Systems...  ...Work alongside model training, inference, and product divisions to provide... 
    Performance

    NVIDIA

    Santa Clara, CA
    4 days ago
  • $184k - $287.5k

     ...for a Senior Systems Software Engineer for our Robotics Team working...  ...for the next wave of AI-powered physical agents is a...  ...simulation, training, and edge inference. NVIDIA’s ISAAC platform binds...  ...software for functionality and performance Collaborate with other teams... 
    Performance

    NVIDIA

    Santa Clara, CA
    1 day ago
  • $272k - $431.25k

     ...this transition flawless, high-performing, and secure for millions of...  ...We are looking for a Principal Engineer to serve as a key technical leader in deploying advanced AI agent frameworks and local runtimes...  .... By combining powerful local inference (Nemotron models) with robust... 
    Performance
    Local area
    Worldwide

    NVIDIA

    Santa Clara, CA
    19 hours ago
  • $255.85k - $361.2k

     ...Overview We are seeking a Principal Engineer to define and architect the...  ...generation of distributed AI systems across heterogeneous...  ...state, locality, and performance at system scale. Key Responsibilities...  ...with AI/ML systems, inference infrastructure, or large‑scale... 
    Performance
    Local area
    Shift work

    Intel Corporation

    Santa Clara, CA
    4 days ago

Do you want to receive more vacancies?

Subscribe and receive similar vacancies to AI Inference Performance Engineer. Be the first to apply!