AI Inference Performance Engineer

$152k - $241.5k

NVIDIA Gruppe

We optimize and benchmark GenAI inference on NVIDIA's latest accelerators, defining the industry’s performance standards across language models, video generation, and speech workloads. We work directly within TensorRT-LLM, SGLang, and vLLM, building the tools that evaluate serving performance at scale. This team sits at the intersection of GPU performance engineering and public accountability. What You Will Be Doing: Drive industry benchmark results: own the end-to-end optimization pipeline, implement and integrate optimizations in quantization, scheduling, memory management, and distributed inference across TensorRT-LLM, SGLang, and vLLM. Define and optimize cutting-edge workloads: identify and shape next-generation inference benchmarks, multi-turn coding, agentic workflows, and other emerging AI use cases. Collaborate with framework and kernel teams to push performance to its extreme on large-scale LLM-MoE models, vision-language models, video diffusion models, recommendation, and speech workloads. Architect distributed inference: Design and optimize execution from single-GPU to rack-scale clusters, managing performance across clusters of GPUs. Establish performance methodology: Apply roofline analysis and systematic profiling to decompose bottlenecks across CUDA kernels, frameworks, and serving layers. Influence the ecosystem: contribute to TensorRT-LLM, vLLM, SGLang, and other open-source projects. Partner with architecture, kernel, and compiler teams to shape GPU roadmaps based on real workload data. Technical Leadership: Raise the technical bar for the team, drive cross-functional execution on tight benchmark timelines, and lead a world-class team. What We Need To See: BS, MS, or PhD in Computer Science, Computer Engineering, Electrical Engineering, or equivalent experience. 5+ years of relevant software development experience. Strong Python or C++ programming, software design, and software engineering skills. Expertise with a DL framework such as PyTorch or JAX. Proven track record of delivering measurable performance improvements in deep learning inference or high-performance systems. Deep understanding of LLM/VLM architectures and inference mechanics: attention, KV caching, batching strategies, decode-phase bottlenecks, speculative decoding, disaggregated serving etc. Ways To Stand Out From The Crowd: Prior experience with an LLM framework (TensorRT-LLM, vLLM, SGLang, etc) or a DL compiler in inference, deployment, algorithms, or implementation. Prior experience with performance modeling, profiling, debug, and code optimization of a DL/HPC/high-performance application. Experience with scale-out inference orchestration (MPI, NCCL, K8S) on large GPU clusters. Expertise in kernel development (CUTLASS, cuteDSL, tilelang, OpenAI Triton) or compiler/runtime paths (torch.compile, graph lowering, operator fusion). Architectural knowledge of CPU, GPU, FPGA or other DL accelerators; GPU programming experience (CUDA). Track record of leading ambiguous, high-impact technical programs across multiple teams under tight deadlines. Your base salary will be determined based on your location, experience, and the pay of employees in similar positions. The base salary range is 152,000 USD - 241,500 USD. You will also be eligible for equity and benefits. NVIDIA is committed to fostering a diverse work environment and proud to be an equal opportunity employer. As we highly value diversity in our current and future employees, we do not discriminate (including in our hiring and promotion practices) on the basis of race, religion, color, national origin, gender, gender expression, sexual orientation, age, marital status, veteran status, disability status or any other characteristic protected by law. #J-18808-Ljbffr NVIDIA Gruppe

Apply

Vacancy posted 4 days ago

Similar jobs that could be interesting for youBased on the AI Inference Performance Engineer in Santa Clara, CA vacancy

Edge Inference Engineer: Local AI Latency Optimizer
...California is seeking a talented individual to optimize inference engines for local environments, impacting the future of AI. Applicants should have a strong background in... ...development, with experience in profiling performance issues. The successful candidate will work...
Performance
Local area
Intel
Santa Clara, CA
3 days ago
Compiler Engineer - AI Inference
$152k - $241.5k
...deep learning ignited modern AI — the next era of computing —... ...seeking top‑tier AI Compiler Engineers to drive innovation within our... ...of what is possible in AI performance and help build the technology... ...problems for AI workloads (both inference and training) and...
Performance
NVIDIA Gruppe
Santa Clara, CA
4 days ago
Senior AI Systems Engineer — SGLang & Inference on GPUs
...leading technology company is seeking a skilled engineer to optimize deep learning frameworks and enhance GPU kernel performance. The ideal candidate excels in collaborative... ...a focus on innovative solutions and advancing AI technologies. #J-18808-Ljbffr Advanced Micro Devices
Performance
Advanced Micro Devices
Santa Clara, CA
3 days ago
Senior LLM Performance Engineer - GPU Inference
$184k - $356.5k
A leading AI computing company in California is seeking a Senior Deep Learning Software Engineer focused on performance optimization of LLM models. You will analyze and enhance LLM inference performance, working in cross-collaborative teams to implement cutting-edge algorithms...
Performance
Full time
NVIDIA Corporation
Santa Clara, CA
2 days ago
Senior DL Inference Engineer - GPU Optimization Equity
...seeking a Senior DL Algorithms Engineer to optimize LLM/Omni models and enhance performance across its software stack. The ideal... ...deep learning, specifically in inference. This role involves profiling,... ...collaborating with teams to advance AI solutions. A strong...
Performance
NVIDIA
Santa Clara, CA
4 days ago
Senior Performance Engineer, Inference
...Systems builds the world's largest AI chip, 56 times larger than... ...‑leading training and inference speeds and empowers machine learning... ...Role We are hiring a Senior Performance Analyst to join our Product... ...Collaborate with Product and Engineering to identify where competitors...
Performance
Contract work
Shift work
Cerebras
Sunnyvale, CA
2 days ago
Inference Optimization Engineer (local / edge runtime)
$170.5k - $315.49k
## Inference Optimization Engineer (local / edge runtime)Applylocations: US, California, Santa Clara: US,... ...MissionAt Intel, our journey is to transform AI into something safer, more... ...across hardware tiers and publish honest performance comparisons* Upstream fixes and patches...
Performance
Internship
Local area
Immediate start
Shift work
Intel
Santa Clara, CA
17 hours ago
Senior Deep Learning Engineer - Model Evaluation & AI Systems
$224k - $356.5k
...into the unlimited potential of AI to define the next era of... ...at the forefront of AI and high-performance computing. As a Senior / Principal Deep Learning Engineer — Model Evaluation & AI Systems... ...Work alongside model training, inference, and product divisions to provide...
Performance
NVIDIA Gruppe
Santa Clara, CA
4 days ago
Senior Technical Marketing Engineer - GPU and System Architecture
$160k - $253k
...centers are transforming into AI factories, and NVIDIA accelerated computing is the engine of artificial intelligence. Our... ...center platforms integrate high performance compute, networking, and a full... ...performance and efficiency for AI inference & training. What you’ll be...
Performance
NVIDIA Gruppe
Santa Clara, CA
4 days ago
ML Systems Engineer
...design and verification with agentic AI workflows. Our platform... ...cutting‑edge generative AI to assist engineers in RTL design, simulation, and... ...Systems Engineer to optimize the performance and efficiency of large language model inference powering our agentic AI platform...
Performance
ScOp Venture Capital
Santa Clara, CA
4 days ago
Senior AI Systems Performance Engineer: Drive SOTA Inference
A leader in AI technology in Palo Alto is seeking a Senior AI Systems Performance Engineer to optimize the latest foundation models on their innovative platform. This role involves collaborating with cross-functional teams to push the performance limits of AI systems....
Performance
SambaNova
Palo Alto, CA
1 day ago
Senior Software Engineer - Secondary Driving System
$170.6k - $261.3k
...global scale. Our Embodied AI teams are redefining what’s possible... ...stop. As a Senior Software Engineer on the Secondary Driving... ...testing, continuous integration, performance profiling, and observability... ...GPU/accelerator‑based ML inference, model deployment, and performance...
Performance
Remote work
Relocation package
Flexible hours
General Motors
Sunnyvale, CA
17 hours ago
Senior RTL Engineer for Physical AI SoC (Equity)
$200k
Velaura is seeking a Senior RTL Engineer to build next-generation Physical AI SoCs. This role involves collaboration with teams to drive microarchitectural decisions and ensure high-performance, power-efficient hardware. The ideal candidate has over 8 years of experience...
Performance
Velaura
Santa Clara, CA
4 days ago
Performance Engineer, Spatial Computing & AI Tooling
Apple Inc. is seeking a Software Performance Engineer for its Vision Products Group in Sunnyvale, California. In this role, you'll optimize AR/VR system software for high-performance and low-latency experiences. You will work with a team aiming to push the boundaries of...
Performance
Apple Inc.
Sunnyvale, CA
4 days ago
Lead Ultrasound Imaging Engineer (AI & Systems)
$165k - $180k
Lead Ultrasound Imaging Engineer (AI & Systems) iSono Health is a dynamic and rapidly growing... ...onward to ensure excellent field performance, high reliability, supply continuity, efficient... ...platforms (e.g., Jetson) for edge AI inference and image processing acceleration....
Performance
iSono Health Inc.
Sunnyvale, CA
17 hours ago
Senior GPU DL Engineer - SGLang & Multinode Inference
Advanced Micro Devices (AMD) is seeking a skilled engineer to optimize deep learning frameworks for AMD GPUs. You will enhance GPU performance, accelerate deep learning models, and work... ...an opportunity to significantly impact AI solutions while fostering innovation and...
Performance
Advanced Micro Devices
Santa Clara, CA
3 days ago
Multi-Target Runtime Engineer for AI Compute Stack
Lemurian Labs in Santa Clara seeks a Runtime Engineer to design and develop a multi-target runtime for their AI compiler stack. This role involves low-level parallelization... ...with compiler and product teams to enhance performance across diverse hardware. The ideal candidate...
Performance
Lemurian Labs
Santa Clara, CA
1 day ago
Senior SI/PI Engineer for AI Compute Hardware
$150k - $250k
MixMode in Santa Clara is looking for a Senior Staff SI/PI Engineer to ensure the electrical integrity of high-performance AI compute platforms. This role involves driving AI accelerator strategies and leading the simulation efforts for complex chip packages. Candidates...
Performance
MixMode
Santa Clara, CA
17 hours ago
Senior GPU Compiler Engineer Hybrid, AI/ML Performance
Intel Corporation is seeking a Senior Compiler Engineer to develop and optimize compiler software for next-generation GPU architectures... ...on cutting-edge compiler technologies that enhance AI and high-performance computing performance. The ideal candidate will possess a Bachelor...
Performance
Intel
Santa Clara, CA
4 days ago
Senior GPU Performance Engineer for AI Acceleration
$207k - $300k
Google is seeking an experienced AI/ML Software Engineer to enhance GPU architectures and optimize performance benchmarks. The role involves collaborating with teams to solve ML model challenges and architect transformative AI solutions, contributing to Google's machine...
Performance
Google
Sunnyvale, CA
1 day ago
Senior GPU Math Library Engineer: AI & HPC Kernel Lead
NVIDIA Gruppe is looking for a senior engineer to join their Math Libraries team in Santa Clara... ...involves designing and implementing high-performance numerical linear algebra software on GPUs... ...opportunity to be part of cutting-edge AI and data center technologies. #J-18808-...
Performance
NVIDIA Gruppe
Santa Clara, CA
4 days ago
Senior MEMS Design Engineer
$180k - $260k
...As a Senior MEMS Design Engineer at nEye.ai, you will be responsible for the design, simulation, and optimization of MEMS devices that... ...Engineers to ensure our MEMS structures are not only high‑performing but also robust and reliable for high‑volume manufacturing....
Performance
nEye Systems, Inc.
Santa Clara, CA
4 days ago
Senior Systems Engineer - Hyperscale AI & Kubernetes
...Corporation is seeking a Senior Systems Software Engineer for their DGX Cloud team in Santa Clara, California. In this role, you will lead performance and scalability analysis of Kubernetes-... ..., ensuring high efficiency for AI workloads. You will work closely with researchers...
Performance
NVIDIA Corporation
Santa Clara, CA
1 day ago
Senior ASIC Power Delivery Engineer for AI/ML TPU
$163k - $237k
...Inc. is seeking an experienced candidate to shape the future of AI/ML hardware acceleration, focusing on TPU technology that... ...and verifying power delivery networks, ensuring reliability and performance in advanced designs. Expect a collaborative environment working...
Performance
Google Inc.
Sunnyvale, CA
17 hours ago
System Speed & Reliability Engineer — AI‑Driven Validation
$136k - $218.5k
NVIDIA in Santa Clara is seeking a Silicon Speed Features Engineer to co-design system-level speed features across... ...involves collaborating cross-functionally and using AI to enhance automation tools for performance validation. Ideal candidates should have a Master’s degree...
Performance
NVIDIA
Santa Clara, CA
1 day ago
Photonics/ AR Process Integration Engineer
$147k - $202.5k
...Materials is a global leader in materials engineering solutions used to produce virtually every... ...that literally connect our world – like AI and IoT. If you want to push the boundaries... .... You will collect and analyze data, perform hardware characterization, and troubleshoot...
Performance
Full time
Work experience placement
Relocation
APPLIED MATERIALS
Santa Clara, CA
5 hours ago
System Speed & Reliability Engineer — AI‑Driven Validation
NVIDIA Gruppe is seeking a Silicon Speed Features Engineer to lead validation and automation infrastructure for silicon issues. You will work across teams to ensure product quality and performance in a dynamic environment. This role requires an MS in EE or equivalent,...
Performance
NVIDIA Gruppe
Santa Clara, CA
1 day ago
Director, Site Reliability Engineering Sunnyvale, CA , USA
$250k
...eGain is the leader in AI knowledge management solutions for enterprises. As organizations... ...As Director of Site Reliability Engineering, you will ensure that eGain’s AI knowledge... ...platform operates with the reliability, performance, and resilience that enterprise...
Performance
Work at office
eGain Corporation
Sunnyvale, CA
4 days ago
Engineer III, SDET - AI Detection and Response (AIDR) (Hybrid)
$120k - $180k
...security with the world's most advanced AI-native platform. We work on large scale... ...About the Role You’ll work closely with engineering teams to expand test coverage across unit... ...foundation that ensures reliability and performance as we deploy AI security controls across...
Performance
Full time
Contract work
Work experience placement
Work at office
Local area
CrowdStrike Holdings, Inc.
Sunnyvale, CA
2 days ago
Senior AI Systems Engineer: Inference Kernels & Runtimes
$184k - $287.5k
NVIDIA Gruppe is seeking talented AI systems engineers to advance innovative technologies in AI inference systems software. This role involves developing cutting-edge libraries, code generators, and kernel technologies for NVIDIA's architecture, emphasizing high-impact...
NVIDIA Gruppe
Santa Clara, CA
4 days ago

Do you want to receive more vacancies?

Subscribe and receive similar vacancies to AI Inference Performance Engineer. Be the first to apply!