Senior Software Engineer, Quantized Inference
$152k - $241.5kNVIDIA
Senior Software Engineer, Quantized Inference page is loaded## Senior Software Engineer, Quantized Inferencelocations: US, WA, Redmond: US, CA, Santa Claratime type: Full timeposted on: Posted Yesterdayjob requisition id: JR2013890We are now looking for a Senior Software Engineer for Quantized Inference! NVIDIA is seeking software engineers to accelerate the discovery and deployment of efficient inference recipes for LLMs. A recipe defines which operators are transformed into low-precision or sparsified variants — unlocking throughput and latency gains without regressing accuracy or verbosity. Recipes may incorporate techniques such as rotations, block scaling to attenuate outlier impact, or improved calibration data drawn from SFT/RL pipelines.Each new recipe demands corresponding kernel and model-level implementations in inference engines (vLLM, TRT-LLM, SGLang). The candidate will translate recipe specifications into functionally correct, performant code, e.g., writing Triton kernels, inserting quantize/dequantize nodes into prefill and decode paths, and ensuring per-expert scaling in MoE layers is handled correctly. From there, the candidate will collaborate with partner inference teams to further optimize throughput and interactivity on target workloads. This work is a core component of our productization effort across Megatron-LM, ModelOpt, and vLLM.**What you'll be doing:*** Implement quantized and sparse recipes in inference engines (vLLM, TRT-LLM, SGLang)* Own model export pipelines (ModelOpt, Megatron-LM HuggingFace), ensuring quantized checkpoints serialize correctly for downstream serving* Build prototypes and benchmarking harnesses to evaluate recipe throughput/interactivity before full optimization* Develop data analysis tooling and visualizations for numerics debugging* Improve developer productivity across the team: CI, build systems, training infrastructure, pipeline friction* Participate in code reviews and incorporate feedback**What we need to see:*** Proficient in Python; familiarity with C++* Strong software engineering fundamentals: concise, well-tested code; fluent with AI-assisted tooling* Experience with ML accelerators with a basic understanding of how certain ML layers affect execution time* Familiarity with PyTorch internals (custom ops, autograd, export) or equivalent framework* Experience reading, modifying, or contributing to a large open-source codebase* MS/PhD in Computer Science or related field, or equivalent experience.* 4+ years in a relevant software engineering role* Demonstrated ability to move fast with ambiguous requirements, with strong written and verbal communication**Ways to stand out from the crowd:*** Experience contributing to inference serving frameworks (vLLM, TRT-LLM, SGLang) or Triton kernel development* Track record of debugging numerical issues across mixed-precision boundaries* Deep experience with model compression techniques: PTQ, QAT, structured/unstructured sparsityYour base salary will be determined based on your location, experience, and the pay of employees in similar positions. The base salary range is 152,000 USD - 241,500 USD for Level 3, and 184,000 USD - 287,500 USD for Level 4.You will also be eligible for equity and .Applications for this job will be accepted at least until March 1, 2026.This posting is for an existing vacancy.NVIDIA uses AI tools in its recruiting processes.NVIDIA is committed to fostering a diverse work environment and proud to be an equal opportunity employer. As we highly value diversity in our current and future employees, we do not discriminate (including in our hiring and promotion practices) on the basis of race, religion, color, national origin, gender, gender expression, sexual orientation, age, marital status, veteran status, disability status or any other characteristic protected by law. #J-18808-Ljbffr NVIDIA
- A leading technology firm is seeking a Senior Software Engineer for Quantized Inference to implement quantized recipes for advanced model optimization. This role demands strong skills in Python and C++, alongside experience in ML accelerators and software engineering fundamentals...Senior
$139k - $204k
...What You’ll Do Senior engineers are area owners who lead designs, raise engineering standards, and deliver measurable improvements to... ...orchestration, and hardware teams to evolve our Kubernetes‑native inference platform and meet strict P99 SLAs at scale. About The...SeniorPermanent employmentTemporary workCasual workWork at officeRemote workFlexible hoursShift work$165k - $242k
...Apply for the Senior Software Engineer II, Inference role at CoreWeave. CoreWeave is The Essential Cloud for AI™. Built for pioneers by pioneers, CoreWeave delivers a platform of technology, tools, and teams that enables innovators to build and scale AI with confidence...SeniorPermanent employmentTemporary workCasual workWork at officeRemote workFlexible hoursShift work$139k - $204k
...Senior Software Engineer I, Inference Sunnyvale, CA / Bellevue, WA CoreWeave is The Essential Cloud for AI™. Built for pioneers by pioneers, CoreWeave delivers a platform of technology, tools, and teams that enables innovators to build and scale AI with confidence....SeniorPermanent employmentTemporary workCasual workWork at officeRemote workFlexible hoursShift work$152k - $241.5k
...NVIDIA is the platform upon which every new AI‑powered application is built. We are seeking a Senior Software Engineer – AI Inference to advance open‑source LLM serving by contributing directly to upstream inference engines like vLLM and SGLang-ensuring they run best‑in...Senior$152k - $241.5k
...Senior Software Engineer – Deep Learning Inference What you’ll be doing: Craft and develop robust inferencing software that can be scaled to multiple platforms for functionality and performance Develop components of TensorRT, NVIDIA’s SDK for high-performance deep learning...Senior- ...RL training and SOTA LLM and Multimodal inference at scale across multi-GPU and multi-node... ...You will collaborate across internal GPU software teams and engage with open-source... ...THE PERSON: Skilled engineer with strong technical and analytical expertise...Senior
- ...your career. THE ROLE: As a senior member of the LLM inference framework team, you will be responsible... ...at the intersection of inference engines, distributed systems, and GPU runtime... ...and kernel development Software Engineering ~ Expertise in Python...Senior
$184k - $287.5k
NVIDIA seeks a Senior Software Engineer specializing in Deep Learning Inference for our growing team. As a key contributor, you will help design, build, and optimize the GPU-accelerated software that powers today’s most sophisticated AI applications. Our team is responsible...Senior$184k - $287.5k
Position Overview We are seeking highly skilled and motivated software engineers to join us and build AI inference systems that serve large-scale models with extreme efficiency. You’ll architect and implement high-performance inference stacks, optimize GPU kernels and...Senior- ...Advanced Micro Devices is seeking a strategic software engineering lead in Santa Clara, California. This role involves improving application... .... Key responsibilities include developing techniques for inference optimization and supporting the ROCm ecosystem expansion. A...Senior
$152k - $241.5k
...NVIDIA Gruppe is seeking a Senior Software Engineer – AI Inference in Santa Clara, California. This role involves enhancing open-source LLM serving optimizations and implementing high-performance runtime capabilities. Candidates should have 5+ years of experience in building...Senior- A leading technology company is seeking a Senior System Software Engineer to develop GPU-accelerated AI inference serving software. The ideal candidate will have over 5 years of experience with deep learning software, strong skills in Rust and C++, and a collaborative approach...Senior
$165k - $242k
...Nasdaq: CRWV) in March 2025. Learn more at What You'll Do: Senior engineers are area owners who lead designs, raise engineering... ...orchestration, and hardware teams to evolve our Kubernetes-native inference platform and meet strict P99 SLAs at scale. About the role...SeniorPermanent employmentTemporary workCasual workWork at officeFlexible hoursShift work$152k - $241.5k
...seeking a talented individual to optimize and benchmark GenAI inference using the latest acceleration technologies. The role involves... ...Required qualifications include a relevant degree and significant software development experience in Python or C++. A deep understanding...Senior$184k - $287.5k
NVIDIA Gruppe in Santa Clara is seeking an AI Systems Engineer to innovate and develop cutting-edge technologies in the AI inference software stack. Candidates should hold a Master's degree and possess over 6 years of experience in ML/DL systems development. The role involves...Senior$184k - $287.5k
NVIDIA Gruppe is seeking talented AI systems engineers to advance innovative technologies in AI inference systems software. This role involves developing cutting-edge libraries, code generators, and kernel technologies for NVIDIA's architecture, emphasizing high-impact...Senior$119.8k - $234.7k
...to Fortune 500 enterprises. Ourconverged AI fabricdelivers inference capabilities for all LLMs inMicrosoft catalog, including OpenAI,Anthropic,Mistral, Cohere, Llama, and more. As a Senior Software Engineer , you will shape the future of one of thelargest and fastest...SeniorOngoing contractLocal area$184k - $356.5k
NVIDIA Gruppe is looking for skilled software engineers to develop AI inference systems that operate with high efficiency. The role involves architecting high-performance inference frameworks and optimizing GPU processes. Ideal candidates should have extensive programming...Senior$230k - $250k
Cerebras Systems is seeking a Sr. Member of Technical Staff in Sunnyvale, CA. This role involves designing resilient software features for cloud-based AI inference, leveraging AWS tools and services. Candidates should have a Master’s degree in Computer Science and experience...Senior- NVIDIA Gruppe is seeking a Senior System Software Engineer in Santa Clara, California, to develop world-class GPU-accelerated AI inference serving software. This role involves contributing to feature development and optimizing software for deployment in production environments...Senior
- ...We’re looking for a Senior Engineer to help build the next-generation inference platform that supports embedding models used for semantic search, retrieval, and... ...backend or infrastructure systems at scale ~ Strong software engineering skills in languages such as Go, Rust,...SeniorLocal areaWorldwide
- General Motors is seeking a Senior ML Infrastructure Engineer to build and scale a robust platform for machine learning inference workflows. You will design backend software components, collaborate with ML engineers, and lead initiatives across GM's ML ecosystem. With over...SeniorRemote job
$152k - $241.5k
...Design, build, and optimize containerized inference execution for the latest 3D VLMs from... ...production‑grade, highly optimized software (NIMs, NVIDIA Inference Microservices)... ...Computer Science + 3 years, or Electrical Engineering, Bachelor of Science (or equivalent experience...Senior- ...limits of real‑time large language model inference? Join NVIDIA’s TensorRT Edge‑LLM team... ...automotive and robotics. We build the software stack that enables Large Language, Vision... ...Computer Science, Electrical/Computer Engineering, or a closely related field. 4+ years of...Senior
$184k - $287.5k
...the core of modern AI infrastructure, from training large-scale models to running inference in production. That position depends on software as much as hardware, and compiler engineering is a big part of what makes it work. What you’ll be doing: Designing and implementing...SeniorWork experience placement$128.7k - $261.3k
The Model Deployment & Inference Solutions team in GM AV deploys machine learning models from training frameworks... ...GitHub Copilot, or equivalent) as part of your engineering workflow. Experience designing clean, well-tested software with clear interfaces and good abstractions....SeniorFlexible hours$139k - $204k
...training clusters, agent building, and inference at scale, we’re combining forces to serve... ...problems at the intersection of software, hardware, and AI, there\'s never been... ...What You’ll Do Reporting to the Engineering Manager for ML Verticals, you will help...SeniorPermanent employmentTemporary workCasual workWork at officeRemote workFlexible hours$182k - $242k
...maintaining benchmarking workflows for end-to-end MLPerf Training and Inference runs, including workload setup, cluster configuration, and... ...to architecture decisions within the team. Break down engineering tasks into clear milestones and deliver reliable, high-quality...SeniorPermanent employmentTemporary workCasual workWork at officeRemote workFlexible hours- ...backend services and APIs that support model inference, orchestration, and tool execution.... ...into working features. Collaborate with ML engineers to integrate, evaluate, and... ...Qualifications 3-5 years of professional software engineering experience. Strong experience...Senior
Do you want to receive more vacancies?
Subscribe and receive similar vacancies to Senior Software Engineer, Quantized Inference. Be the first to apply!
- software engineer internship remote Santa Clara, CA
- new grad software engineer Santa Clara, CA
- software engineer staff Santa Clara, CA
- integration software engineer Santa Clara, CA
- machine learning software engineer Santa Clara, CA
- senior robotics software engineer Santa Clara, CA
- software engineer entry level Santa Clara, CA
- software development engineer aws Santa Clara, CA
- startup software engineer Santa Clara, CA
- rust software engineer Santa Clara, CA

