Senior Software Engineer - AI Inference

$152k - $241.5k

NVIDIA Gruppe

NVIDIA is the platform upon which every new AI‑powered application is built. We are seeking a Senior Software Engineer – AI Inference to advance open‑source LLM serving by contributing directly to upstream inference engines like vLLM and SGLang-ensuring they run best‑in‑class on NVIDIA GPUs and systems-and by improving the underlying stack that enables high‑throughput, low‑latency inference at scale. This is a hands‑on role for an engineer who enjoys digging into performance bottlenecks, designing pragmatic runtime improvements, and shipping high‑quality changes that are broadly useful to the community and production deployments. What you’ll be doing: Contribute features, fixes, and optimizations upstream to vLLM/SGLang: author PRs, participate in reviews, write benchmarks/tests, and help drive designs to completion. Implement and optimize inference‑runtime capabilities: batching and scheduling policies, streaming, request lifecycle management, and KV‑cache efficiency (paging/sharding) to improve throughput and tail latency. Profile and improve hot paths across layers—from Python orchestration to C++/CUDA kernels—using data to guide optimization work. Improve multi‑GPU inference performance and reliability: parallelism strategies, communication patterns, and resource utilization across NVIDIA platforms. Build and maintain performance and correctness regression tests to prevent slowdowns and ensure stable behavior across model and hardware configurations. Collaborate with model, platform, and SRE teams to translate production requirements into upstreamable solutions with strong operability and maintainability. What we need to see: 5+ years building production software with solid systems engineering fundamentals and a track record of delivering performance or reliability improvements. Experience with LLM inference/serving stacks (e.g., vLLM, SGLang) and an understanding of the tradeoffs that drive real production performance. Strong programming skills in Python plus C++ and/or CUDA; ability to debug and optimize performance‑critical code. Experience with profiling and performance investigation (microbenchmarks, flame graphs, GPU profiling) and a measurement‑driven mindset. Familiarity with distributed systems concepts and concurrency (queues/schedulers, multi‑process/multi‑threading, scaling across GPUs/nodes). Strong communication skills and comfort working with open‑source communities (issues, PR discussions, code review). BS/MS in Computer Science, Computer Engineering, or related field (or equivalent experience). Ways to stand out from the crowd: Open‑source contributions to vLLM, SGLang, PyTorch, Triton, NCCL, Dynamo or adjacent serving/runtime projects. Shipped performance work such as improved attention/KV cache efficiency, speculative decoding, scheduler improvements, quantization‑aware serving, or streaming latency reductions. Experience building reproducible benchmarking and performance regression infrastructure for latency/throughput. Systems performance background spanning memory bandwidth, kernel fusion, PCIe/NVLink effects, and network fabrics (e.g., InfiniBand). Your base salary will be determined based on your location, experience, and the pay of employees in similar positions. The base salary range is $152,000 USD - $241,500 USD for Level 3, and $184,000 USD - $287,500 USD for Level 4. You will also be eligible for equity and benefits. Applications for this job will be accepted at least until April 18, 2026. This posting is for an existing vacancy. NVIDIA uses AI tools in its recruiting processes. NVIDIA is committed to fostering a diverse work environment and proud to be an equal opportunity employer. As we highly value diversity in our current and future employees, we do not discriminate (including in our hiring and promotion practices) on the basis of race, religion, color, national origin, gender, gender expression, sexual orientation, age, marital status, veteran status, disability status or any other characteristic protected by law. #J-18808-Ljbffr

Apply

Vacancy posted 1 day ago

Similar jobs that could be interesting for youBased on the Senior Software Engineer - AI Inference in Santa Clara, CA vacancy

Senior Software Engineer, Quantized Inference
$152k - $241.5k
...Senior Software Engineer, Quantized Inference page is loaded## Senior Software Engineer, Quantized Inferencelocations: US, WA, Redmond: US, CA, Santa Claratime... ...fundamentals: concise, well-tested code; fluent with AI-assisted tooling* Experience with ML accelerators with...
Senior
NVIDIA
Santa Clara, CA
6 hours ago
Senior Deep Learning Software Engineer, Inference
$184k - $287.5k
...NVIDIA seeks a Senior Software Engineer specializing in Deep Learning Inference for our growing team. As a key contributor, you will help design, build, and optimize... ...accelerated software that powers today’s most sophisticated AI applications. Our team is responsible for...
Senior
NVIDIA Gruppe
Santa Clara, CA
5 hours ago
Senior Software Engineer, Machine Learning Inference
$152k - $241.5k
...innovation, driving advancements in AI and machine learning to solve... ...seeking talented and motivated engineers to join our TensorRT team in developing... ...industry-leading deep learning inference software for NVIDIA AI accelerators. As a Senior Software Engineer in the...
Senior
NVIDIA
Santa Clara, CA
3 days ago
Senior Software Development Engineer - SGLang and Inference Stack
...generation computing experiences—from AI and data centers, to PCs,... ...and SOTA LLM and Multimodal inference at scale across multi-GPU and... ...across internal GPU software teams and engage with open-source... ...ecosystem. THE PERSON: Skilled engineer with strong technical and...
Senior
Advanced Micro Devices , Inc.
Santa Clara, CA
6 hours ago
Senior Software Engineer, Deep Learning Inference - TensorRT
$152k - $241.5k
...We are now looking for a Senior Software Engineer for Deep Learning Inference! Would you like to make a big impact in Deep Learning by helping build a state-of... ....This posting is for an existing vacancy.NVIDIA uses AI tools in its recruiting processes.NVIDIA is committed...
Senior
NVIDIA
Santa Clara, CA
6 hours ago
Senior Software Development Engineer - LLM Inference Framework
...computing experiences-from AI and data centers, to PCs,... ...career. THE ROLE: As a senior member of the LLM inference framework team, you will... ...intersection of inference engines, distributed systems, and... ...kernel development Software Engineering ~ Expertise...
Senior
Advanced Micro Devices , Inc.
Santa Clara, CA
2 days ago
Senior Software Engineer II, Inference
$165k - $242k
...Senior Software Engineer II, Inference Sunnyvale, CA / Bellevue, WA CoreWeave is The Essential Cloud for AI™. Built for pioneers by pioneers, CoreWeave delivers a platform of technology, tools, and teams that enables innovators to build and scale AI with confidence...
Senior
Permanent employment
Temporary work
Casual work
Work at office
Remote work
Flexible hours
Shift work
CoreWeave
Sunnyvale, CA
3 days ago
Senior Software Engineer I, Inference
$139k - $204k
...Senior Software Engineer I, Inference Sunnyvale, CA / Bellevue, WA CoreWeave is The Essential Cloud for AI™. Built for pioneers by pioneers, CoreWeave delivers a platform of technology, tools, and teams that enables innovators to build and scale AI with confidence....
Senior
Permanent employment
Temporary work
Casual work
Work at office
Remote work
Flexible hours
Shift work
CoreWeave
Sunnyvale, CA
3 days ago
Senior Software Engineer, AI Inference Systems
$184k - $287.5k
...Position Overview We are seeking highly skilled and motivated software engineers to join us and build AI inference systems that serve large-scale models with extreme efficiency. You’ll architect and implement high-performance inference stacks, optimize GPU kernels and...
Senior
NVIDIA Gruppe
Santa Clara, CA
6 hours ago
Senior Software Engineer I, Inference
$139k - $204k
...Description CoreWeave is The Essential Cloud for AI™. Built for pioneers by pioneers,... ...2025. Learn more at What You'll Do: Senior engineers are area owners who lead designs, raise... ...teams to evolve our Kubernetes-native inference platform and meet strict P99 SLAs at scale...
Senior
Permanent employment
Temporary work
Casual work
Work at office
Flexible hours
Shift work
CoreWeave
Sunnyvale, CA
18 days ago
Senior Software Engineer, Deep Learning Inference - Automotive Safety
$152k - $241.5k
...learning and eager to work on cutting-edge AI technology for safety-critical applications? Join NVIDIA's TensorRT team as a Senior Software Engineer, and be at the forefront of technology, enabling high-performance AI inference solutions for automotive safety and other...
Senior
NVIDIA
Santa Clara, CA
1 day ago
Senior Staff Engineer — AI Inference & Cloud Infra
$230k - $250k
Cerebras Systems is seeking a Sr. Member of Technical Staff in Sunnyvale, CA. This role involves designing resilient software features for cloud-based AI inference, leveraging AWS tools and services. Candidates should have a Master’s degree in Computer Science and experience...
Senior
Cerebras Systems
Sunnyvale, CA
3 days ago
Senior AI Kernel & Inference Engineer
...A leading technology company is seeking a Senior AI Software Engineer to join their team in Santa Clara, California. In this role, you will innovate and develop groundbreaking AI systems software for inference applications including deep learning framework optimizations...
Senior
NVIDIA
Santa Clara, CA
1 day ago
Senior AI Inference Kernel Engineer
$184k - $287.5k
...NVIDIA Gruppe in Santa Clara is seeking an AI Systems Engineer to innovate and develop cutting-edge technologies in the AI inference software stack. Candidates should hold a Master's degree and possess over 6 years of experience in ML/DL systems development. The role...
Senior
NVIDIA Gruppe
Santa Clara, CA
6 hours ago
Senior AI Inference Engineer - High-Performance LLM Serving
$152k - $241.5k
...NVIDIA Gruppe is seeking a Senior Software Engineer – AI Inference in Santa Clara, California. This role involves enhancing open-source LLM serving optimizations and implementing high-performance runtime capabilities. Candidates should have 5+ years of experience in building...
Senior
NVIDIA Gruppe
Santa Clara, CA
6 hours ago
Senior Software Engineer, Inference Platform Palo Alto
...We’re looking for a Senior Engineer to help build the next-generation inference platform that supports embedding models used... ...semantic search, retrieval, and AI-native experiences in MongoDB Atlas... ...infrastructure systems at scale ~ Strong software engineering skills in languages...
Senior
Local area
Worldwide
MongoDB
Palo Alto, CA
6 hours ago
Senior AI Inference & Distributed Systems Engineer
...Advanced Micro Devices is seeking a strategic software engineering lead in Santa Clara, California. This role involves improving application... .... Key responsibilities include developing techniques for inference optimization and supporting the ROCm ecosystem expansion. A...
Senior
Advanced Micro Devices , Inc.
Santa Clara, CA
7 hours ago
Senior AI Inference Systems Engineer: GPU-Optimized, Cloud
$184k - $356.5k
NVIDIA Gruppe is looking for skilled software engineers to develop AI inference systems that operate with high efficiency. The role involves architecting high-performance inference frameworks and optimizing GPU processes. Ideal candidates should have extensive programming...
Senior
NVIDIA Gruppe
Santa Clara, CA
4 days ago
Senior GPU AI Inference Engineer - Triton & Dynamo
A leading technology company is seeking a Senior System Software Engineer to develop GPU-accelerated AI inference serving software. The ideal candidate will have over 5 years of experience with deep learning software, strong skills in Rust and C++, and a collaborative approach...
Senior
NVIDIA Corporation
Santa Clara, CA
5 days ago
Senior AI Inference Performance Engineer (GPU/Cluster)
$152k - $241.5k
...seeking a talented individual to optimize and benchmark GenAI inference using the latest acceleration technologies. The role involves... ...Required qualifications include a relevant degree and significant software development experience in Python or C++. A deep understanding...
Senior
NVIDIA Gruppe
Santa Clara, CA
4 days ago
Senior Software Engineer - CoreAI Model Inference & Serving
$119.8k - $234.7k
...edgestartups to Fortune 500 enterprises. Ourconverged AI fabricdelivers inference capabilities for all LLMs inMicrosoft catalog, including... ...,Anthropic,Mistral, Cohere, Llama, and more. As a Senior Software Engineer , you will shape the future of one of thelargest and...
Senior
Ongoing contract
Local area
Microsoft Corporation
Mountain View, CA
1 day ago
Senior AI Inference Compiler Engineer
$152k - $241.5k
...recently, GPU deep learning ignited modern AI — the next era of computing — with... ...for an AI & Deep Learning Compiler Engineer. NVIDIA is hiring software engineers for its Deep Learning & AI... ...has been the backbone of NVIDIA’s inference engine, spanning across data centers...
Senior
NVIDIA Gruppe
Santa Clara, CA
1 day ago
Senior System Software Engineer GPU AI Inference (Triton)
...NVIDIA Gruppe is seeking a Senior System Software Engineer in Santa Clara, California, to develop world-class GPU-accelerated AI inference serving software. This role involves contributing to feature development and optimizing software for deployment in production environments...
Senior
NVIDIA Gruppe
Santa Clara, CA
6 hours ago
Senior AI Systems Engineer: Inference Kernels & Runtimes
$184k - $287.5k
NVIDIA Gruppe is seeking talented AI systems engineers to advance innovative technologies in AI inference systems software. This role involves developing cutting-edge libraries, code generators, and kernel technologies for NVIDIA's architecture, emphasizing high-impact...
Senior
NVIDIA Gruppe
Santa Clara, CA
4 days ago
Senior Platform Engineer, NIM SDK & Framework AI Inference
$184k - $356.5k
...NVIDIA Gruppe is seeking a Senior Engineer to lead the evolution of the core NIM Platform SDK and microservice framework in Santa... ...cloud-native architectures, contributing to production-grade software supporting AI applications. The position offers a competitive salary...
Senior
NVIDIA Gruppe
Santa Clara, CA
7 hours ago
Senior Inference Platform Engineer Low-Latency, Multi-Tenant
...A leading data platform company in Palo Alto seeks a Senior Engineer to develop a cutting-edge inference platform supporting semantic search and AI-native experiences. The ideal candidate will have over five years of experience in backend systems and proficiency in languages...
Senior
MongoDB
Palo Alto, CA
6 hours ago
Senior ML Inference Platform Engineer (Remote)
...Israelvcforum is looking for a Senior ML Infrastructure Engineer in Mountain View, California. This... ...scale robust platforms for ML inference workflows supporting GM’s AI efforts. You will collaborate... ...strategies and handle backend software components. The position demands...
Senior
Remote work
Israelvcforum
Mountain View, CA
1 day ago
Senior Compiler Engineer, AI Inference Platforms
$152k - $241.5k
...Overview AI & Deep Learning Compiler Engineer for NVIDIA’s Deep Learning & AI Compiler (DLC) team. What you’ll be doing Analyze deep learning networks... ...optimization algorithms. Collaborate with deep learning software framework teams and GPU architecture teams to accelerate...
Senior
NVIDIA Gruppe
Santa Clara, CA
6 hours ago
Senior Software Engineer, DGX Cloud AI Infrastructure
$184k - $287.5k
## Senior Software Engineer, DGX Cloud AI InfrastructureApplylocations: US, CA, Santa Clara: US, TX, Austin: US, OR, Remote: US, WA, Remote: US, WA,... ...analysis, and optimization of distributed training and inference workloads across NVIDIA GPU platforms at the largest scales...
Senior
Remote work
NVIDIA
Santa Clara, CA
7 hours ago
Senior Deep Learning Software Engineer
$224k - $356.5k
...We are looking for a Senior Deep Learning Software Engineer to design and build our automated inference and deployment solution. As part of the team, you will be instrumental... ...or equivalent experience in Computer Science, AI, Applied Math, or related field. 8+ years of relevant...
Senior
NVIDIA Gruppe
Santa Clara, CA
6 hours ago

Do you want to receive more vacancies?

Subscribe and receive similar vacancies to Senior Software Engineer - AI Inference. Be the first to apply!