Senior Software Engineer, Quantized Inference

$152k - $241.5k

NVIDIA

Senior Software Engineer, Quantized Inference page is loaded## Senior Software Engineer, Quantized Inferencelocations: US, WA, Redmond: US, CA, Santa Claratime type: Full timeposted on: Posted Yesterdayjob requisition id: JR2013890We are now looking for a Senior Software Engineer for Quantized Inference! NVIDIA is seeking software engineers to accelerate the discovery and deployment of efficient inference recipes for LLMs. A recipe defines which operators are transformed into low-precision or sparsified variants — unlocking throughput and latency gains without regressing accuracy or verbosity. Recipes may incorporate techniques such as rotations, block scaling to attenuate outlier impact, or improved calibration data drawn from SFT/RL pipelines.Each new recipe demands corresponding kernel and model-level implementations in inference engines (vLLM, TRT-LLM, SGLang). The candidate will translate recipe specifications into functionally correct, performant code, e.g., writing Triton kernels, inserting quantize/dequantize nodes into prefill and decode paths, and ensuring per-expert scaling in MoE layers is handled correctly. From there, the candidate will collaborate with partner inference teams to further optimize throughput and interactivity on target workloads. This work is a core component of our productization effort across Megatron-LM, ModelOpt, and vLLM.**What you'll be doing:*** Implement quantized and sparse recipes in inference engines (vLLM, TRT-LLM, SGLang)* Own model export pipelines (ModelOpt, Megatron-LM HuggingFace), ensuring quantized checkpoints serialize correctly for downstream serving* Build prototypes and benchmarking harnesses to evaluate recipe throughput/interactivity before full optimization* Develop data analysis tooling and visualizations for numerics debugging* Improve developer productivity across the team: CI, build systems, training infrastructure, pipeline friction* Participate in code reviews and incorporate feedback**What we need to see:*** Proficient in Python; familiarity with C++* Strong software engineering fundamentals: concise, well-tested code; fluent with AI-assisted tooling* Experience with ML accelerators with a basic understanding of how certain ML layers affect execution time* Familiarity with PyTorch internals (custom ops, autograd, export) or equivalent framework* Experience reading, modifying, or contributing to a large open-source codebase* MS/PhD in Computer Science or related field, or equivalent experience.* 4+ years in a relevant software engineering role* Demonstrated ability to move fast with ambiguous requirements, with strong written and verbal communication**Ways to stand out from the crowd:*** Experience contributing to inference serving frameworks (vLLM, TRT-LLM, SGLang) or Triton kernel development* Track record of debugging numerical issues across mixed-precision boundaries* Deep experience with model compression techniques: PTQ, QAT, structured/unstructured sparsityYour base salary will be determined based on your location, experience, and the pay of employees in similar positions. The base salary range is 152,000 USD - 241,500 USD for Level 3, and 184,000 USD - 287,500 USD for Level 4.You will also be eligible for equity and .Applications for this job will be accepted at least until March 1, 2026.This posting is for an existing vacancy.NVIDIA uses AI tools in its recruiting processes.NVIDIA is committed to fostering a diverse work environment and proud to be an equal opportunity employer. As we highly value diversity in our current and future employees, we do not discriminate (including in our hiring and promotion practices) on the basis of race, religion, color, national origin, gender, gender expression, sexual orientation, age, marital status, veteran status, disability status or any other characteristic protected by law. #J-18808-Ljbffr NVIDIA

Apply

Vacancy posted 2 days ago

Similar jobs that could be interesting for youBased on the Senior Software Engineer, Quantized Inference in Santa Clara, CA vacancy

Senior Quantized Inference Engineer - AI Throughput
A leading technology firm is seeking a Senior Software Engineer for Quantized Inference to implement quantized recipes for advanced model optimization. This role demands strong skills in Python and C++, alongside experience in ML accelerators and software engineering fundamentals...
Senior
NVIDIA
Santa Clara, CA
2 days ago
Senior Software Engineer I, Inference
$139k - $204k
...What You’ll Do Senior engineers are area owners who lead designs, raise engineering standards, and deliver measurable improvements to... ...orchestration, and hardware teams to evolve our Kubernetes‑native inference platform and meet strict P99 SLAs at scale. About The...
Senior
Permanent employment
Temporary work
Casual work
Work at office
Remote work
Flexible hours
Shift work
CoreWeave
Sunnyvale, CA
2 days ago
Senior Software Engineer II, Inference
$165k - $242k
...Apply for the Senior Software Engineer II, Inference role at CoreWeave. CoreWeave is The Essential Cloud for AI™. Built for pioneers by pioneers, CoreWeave delivers a platform of technology, tools, and teams that enables innovators to build and scale AI with confidence...
Senior
Permanent employment
Temporary work
Casual work
Work at office
Remote work
Flexible hours
Shift work
CoreWeave
Sunnyvale, CA
2 days ago
Senior Software Engineer I, Inference
$139k - $204k
...Senior Software Engineer I, Inference Sunnyvale, CA / Bellevue, WA CoreWeave is The Essential Cloud for AI™. Built for pioneers by pioneers, CoreWeave delivers a platform of technology, tools, and teams that enables innovators to build and scale AI with confidence....
Senior
Permanent employment
Temporary work
Casual work
Work at office
Remote work
Flexible hours
Shift work
CoreWeave
Sunnyvale, CA
12 hours ago
Senior Software Engineer - AI Inference
$152k - $241.5k
...NVIDIA is the platform upon which every new AI‑powered application is built. We are seeking a Senior Software Engineer – AI Inference to advance open‑source LLM serving by contributing directly to upstream inference engines like vLLM and SGLang-ensuring they run best‑in...
Senior
NVIDIA Gruppe
Santa Clara, CA
3 days ago
Senior Software Engineer, Deep Learning Inference - TensorRT
$152k - $241.5k
...Senior Software Engineer – Deep Learning Inference What you’ll be doing: Craft and develop robust inferencing software that can be scaled to multiple platforms for functionality and performance Develop components of TensorRT, NVIDIA’s SDK for high-performance deep learning...
Senior
NVIDIA Gruppe
Santa Clara, CA
2 days ago
Senior Software Development Engineer - SGLang and Inference Stack
...RL training and SOTA LLM and Multimodal inference at scale across multi-GPU and multi-node... ...You will collaborate across internal GPU software teams and engage with open-source... ...THE PERSON: Skilled engineer with strong technical and analytical expertise...
Senior
Advanced Micro Devices , Inc.
Santa Clara, CA
4 days ago
Senior Software Development Engineer - LLM Inference Framework
...your career. THE ROLE: As a senior member of the LLM inference framework team, you will be responsible... ...at the intersection of inference engines, distributed systems, and GPU runtime... ...and kernel development Software Engineering ~ Expertise in Python...
Senior
Advanced Micro Devices , Inc.
Santa Clara, CA
4 days ago
Senior Deep Learning Software Engineer, Inference
$184k - $287.5k
NVIDIA seeks a Senior Software Engineer specializing in Deep Learning Inference for our growing team. As a key contributor, you will help design, build, and optimize the GPU-accelerated software that powers today’s most sophisticated AI applications. Our team is responsible...
Senior
NVIDIA Gruppe
Santa Clara, CA
12 hours ago
Senior Software Engineer, AI Inference Systems
$184k - $287.5k
Position Overview We are seeking highly skilled and motivated software engineers to join us and build AI inference systems that serve large-scale models with extreme efficiency. You’ll architect and implement high-performance inference stacks, optimize GPU kernels and...
Senior
NVIDIA Gruppe
Santa Clara, CA
1 day ago
Senior AI Inference & Distributed Systems Engineer
...Advanced Micro Devices is seeking a strategic software engineering lead in Santa Clara, California. This role involves improving application... .... Key responsibilities include developing techniques for inference optimization and supporting the ROCm ecosystem expansion. A...
Senior
Advanced Micro Devices , Inc.
Santa Clara, CA
2 days ago
Senior AI Inference Engineer - High-Performance LLM Serving
$152k - $241.5k
...NVIDIA Gruppe is seeking a Senior Software Engineer – AI Inference in Santa Clara, California. This role involves enhancing open-source LLM serving optimizations and implementing high-performance runtime capabilities. Candidates should have 5+ years of experience in building...
Senior
NVIDIA Gruppe
Santa Clara, CA
2 days ago
Senior GPU AI Inference Engineer - Triton & Dynamo
A leading technology company is seeking a Senior System Software Engineer to develop GPU-accelerated AI inference serving software. The ideal candidate will have over 5 years of experience with deep learning software, strong skills in Rust and C++, and a collaborative approach...
Senior
NVIDIA Corporation
Santa Clara, CA
2 days ago
Senior Software Engineer II, Inference
$165k - $242k
...Nasdaq: CRWV) in March 2025. Learn more at What You'll Do: Senior engineers are area owners who lead designs, raise engineering... ...orchestration, and hardware teams to evolve our Kubernetes-native inference platform and meet strict P99 SLAs at scale. About the role...
Senior
Permanent employment
Temporary work
Casual work
Work at office
Flexible hours
Shift work
CoreWeave
Sunnyvale, CA
20 days ago
Senior AI Inference Performance Engineer (GPU/Cluster)
$152k - $241.5k
...seeking a talented individual to optimize and benchmark GenAI inference using the latest acceleration technologies. The role involves... ...Required qualifications include a relevant degree and significant software development experience in Python or C++. A deep understanding...
Senior
NVIDIA Gruppe
Santa Clara, CA
1 day ago
Senior AI Inference Kernel Engineer
$184k - $287.5k
NVIDIA Gruppe in Santa Clara is seeking an AI Systems Engineer to innovate and develop cutting-edge technologies in the AI inference software stack. Candidates should hold a Master's degree and possess over 6 years of experience in ML/DL systems development. The role involves...
Senior
NVIDIA Gruppe
Santa Clara, CA
1 day ago
Senior AI Systems Engineer: Inference Kernels & Runtimes
$184k - $287.5k
NVIDIA Gruppe is seeking talented AI systems engineers to advance innovative technologies in AI inference systems software. This role involves developing cutting-edge libraries, code generators, and kernel technologies for NVIDIA's architecture, emphasizing high-impact...
Senior
NVIDIA Gruppe
Santa Clara, CA
1 day ago
Senior Software Engineer - CoreAI Model Inference & Serving
$119.8k - $234.7k
...to Fortune 500 enterprises. Ourconverged AI fabricdelivers inference capabilities for all LLMs inMicrosoft catalog, including OpenAI,Anthropic,Mistral, Cohere, Llama, and more. As a Senior Software Engineer , you will shape the future of one of thelargest and fastest...
Senior
Ongoing contract
Local area
Microsoft Corporation
Mountain View, CA
4 days ago
Senior AI Inference Systems Engineer: GPU-Optimized, Cloud
$184k - $356.5k
NVIDIA Gruppe is looking for skilled software engineers to develop AI inference systems that operate with high efficiency. The role involves architecting high-performance inference frameworks and optimizing GPU processes. Ideal candidates should have extensive programming...
Senior
NVIDIA Gruppe
Santa Clara, CA
1 day ago
Senior Staff Engineer — AI Inference & Cloud Infra
$230k - $250k
Cerebras Systems is seeking a Sr. Member of Technical Staff in Sunnyvale, CA. This role involves designing resilient software features for cloud-based AI inference, leveraging AWS tools and services. Candidates should have a Master’s degree in Computer Science and experience...
Senior
Cerebras Systems
Sunnyvale, CA
12 hours ago
Senior System Software Engineer — GPU AI Inference (Triton)
NVIDIA Gruppe is seeking a Senior System Software Engineer in Santa Clara, California, to develop world-class GPU-accelerated AI inference serving software. This role involves contributing to feature development and optimizing software for deployment in production environments...
Senior
NVIDIA Gruppe
Santa Clara, CA
1 day ago
Senior Software Engineer, Inference Platform Palo Alto
...We’re looking for a Senior Engineer to help build the next-generation inference platform that supports embedding models used for semantic search, retrieval, and... ...backend or infrastructure systems at scale ~ Strong software engineering skills in languages such as Go, Rust,...
Senior
Local area
Worldwide
MongoDB
Palo Alto, CA
2 days ago
Remote Senior ML Inference Platform Engineer
General Motors is seeking a Senior ML Infrastructure Engineer to build and scale a robust platform for machine learning inference workflows. You will design backend software components, collaborate with ML engineers, and lead initiatives across GM's ML ecosystem. With over...
Senior
Remote job
General Motors
Sunnyvale, CA
3 days ago
Senior Software Engineer - VLM Microservices for Neural Reconstruction
$152k - $241.5k
...Design, build, and optimize containerized inference execution for the latest 3D VLMs from... ...production‑grade, highly optimized software (NIMs, NVIDIA Inference Microservices)... ...Computer Science + 3 years, or Electrical Engineering, Bachelor of Science (or equivalent experience...
Senior
NVIDIA Gruppe
Santa Clara, CA
2 days ago
Senior Software Engineer - TensorRT Edge-LLM
...limits of real‑time large language model inference? Join NVIDIA’s TensorRT Edge‑LLM team... ...automotive and robotics. We build the software stack that enables Large Language, Vision... ...Computer Science, Electrical/Computer Engineering, or a closely related field. 4+ years of...
Senior
NVIDIA Gruppe
Santa Clara, CA
2 days ago
Senior Software Engineer, DL Compilers
$184k - $287.5k
...the core of modern AI infrastructure, from training large-scale models to running inference in production. That position depends on software as much as hardware, and compiler engineering is a big part of what makes it work. What you’ll be doing: Designing and implementing...
Senior
Work experience placement
NVIDIA Gruppe
Santa Clara, CA
2 days ago
Senior ML Inference Engineer - Platform
$128.7k - $261.3k
The Model Deployment & Inference Solutions team in GM AV deploys machine learning models from training frameworks... ...GitHub Copilot, or equivalent) as part of your engineering workflow. Experience designing clean, well-tested software with clear interfaces and good abstractions....
Senior
Flexible hours
General Motors
Sunnyvale, CA
4 days ago
Senior Software Engineer, Experiment Insights - Weights & Biases
$139k - $204k
...training clusters, agent building, and inference at scale, we’re combining forces to serve... ...problems at the intersection of software, hardware, and AI, there\'s never been... ...What You’ll Do Reporting to the Engineering Manager for ML Verticals, you will help...
Senior
Permanent employment
Temporary work
Casual work
Work at office
Remote work
Flexible hours
Weights & Biases
Sunnyvale, CA
2 days ago
Senior Software Engineer - Perf and Benchmarking
$182k - $242k
...maintaining benchmarking workflows for end-to-end MLPerf Training and Inference runs, including workload setup, cluster configuration, and... ...to architecture decisions within the team. Break down engineering tasks into clear milestones and deliver reliable, high-quality...
Senior
Permanent employment
Temporary work
Casual work
Work at office
Remote work
Flexible hours
CoreWeave
Sunnyvale, CA
2 days ago
Senior Software Engineer - AI Agentic Product Dev
...backend services and APIs that support model inference, orchestration, and tool execution.... ...into working features. Collaborate with ML engineers to integrate, evaluate, and... ...Qualifications 3-5 years of professional software engineering experience. Strong experience...
Senior
Nutanix
Santa Clara, CA
2 days ago

Do you want to receive more vacancies?

Subscribe and receive similar vacancies to Senior Software Engineer, Quantized Inference. Be the first to apply!