Sign up to access all features of our service.
  • Job search
  • Favorites
  • Create a CV
    New
  • Salaries
  • Subscriptions

Senior Software Engineer, Quantized Inference

$152k - $241.5k

NVIDIA

Senior Software Engineer, Quantized Inference page is loaded## Senior Software Engineer, Quantized Inferencelocations: US, WA, Redmond: US, CA, Santa Claratime type: Full timeposted on: Posted Yesterdayjob requisition id: JR2013890We are now looking for a Senior Software Engineer for Quantized Inference! NVIDIA is seeking software engineers to accelerate the discovery and deployment of efficient inference recipes for LLMs. A recipe defines which operators are transformed into low-precision or sparsified variants — unlocking throughput and latency gains without regressing accuracy or verbosity. Recipes may incorporate techniques such as rotations, block scaling to attenuate outlier impact, or improved calibration data drawn from SFT/RL pipelines.Each new recipe demands corresponding kernel and model-level implementations in inference engines (vLLM, TRT-LLM, SGLang). The candidate will translate recipe specifications into functionally correct, performant code, e.g., writing Triton kernels, inserting quantize/dequantize nodes into prefill and decode paths, and ensuring per-expert scaling in MoE layers is handled correctly. From there, the candidate will collaborate with partner inference teams to further optimize throughput and interactivity on target workloads. This work is a core component of our productization effort across Megatron-LM, ModelOpt, and vLLM.**What you'll be doing:*** Implement quantized and sparse recipes in inference engines (vLLM, TRT-LLM, SGLang)* Own model export pipelines (ModelOpt, Megatron-LM HuggingFace), ensuring quantized checkpoints serialize correctly for downstream serving* Build prototypes and benchmarking harnesses to evaluate recipe throughput/interactivity before full optimization* Develop data analysis tooling and visualizations for numerics debugging* Improve developer productivity across the team: CI, build systems, training infrastructure, pipeline friction* Participate in code reviews and incorporate feedback**What we need to see:*** Proficient in Python; familiarity with C++* Strong software engineering fundamentals: concise, well-tested code; fluent with AI-assisted tooling* Experience with ML accelerators with a basic understanding of how certain ML layers affect execution time* Familiarity with PyTorch internals (custom ops, autograd, export) or equivalent framework* Experience reading, modifying, or contributing to a large open-source codebase* MS/PhD in Computer Science or related field, or equivalent experience.* 4+ years in a relevant software engineering role* Demonstrated ability to move fast with ambiguous requirements, with strong written and verbal communication**Ways to stand out from the crowd:*** Experience contributing to inference serving frameworks (vLLM, TRT-LLM, SGLang) or Triton kernel development* Track record of debugging numerical issues across mixed-precision boundaries* Deep experience with model compression techniques: PTQ, QAT, structured/unstructured sparsityYour base salary will be determined based on your location, experience, and the pay of employees in similar positions. The base salary range is 152,000 USD - 241,500 USD for Level 3, and 184,000 USD - 287,500 USD for Level 4.You will also be eligible for equity and .Applications for this job will be accepted at least until March 1, 2026.This posting is for an existing vacancy.NVIDIA uses AI tools in its recruiting processes.NVIDIA is committed to fostering a diverse work environment and proud to be an equal opportunity employer. As we highly value diversity in our current and future employees, we do not discriminate (including in our hiring and promotion practices) on the basis of race, religion, color, national origin, gender, gender expression, sexual orientation, age, marital status, veteran status, disability status or any other characteristic protected by law. #J-18808-Ljbffr NVIDIA

Vacancy posted 2 days ago
Similar jobs that could be interesting for youBased on the Senior Software Engineer, Quantized Inference in Santa Clara, CA vacancy
  • A leading technology firm is seeking a Senior Software Engineer for Quantized Inference to implement quantized recipes for advanced model optimization. This role demands strong skills in Python and C++, alongside experience in ML accelerators and software engineering fundamentals... 
    Senior

    NVIDIA

    Santa Clara, CA
    2 days ago
  • $139k - $204k

     ...What You’ll Do Senior engineers are area owners who lead designs, raise engineering standards, and deliver measurable improvements to...  ...orchestration, and hardware teams to evolve our Kubernetes‑native inference platform and meet strict P99 SLAs at scale. About The... 
    Senior
    Permanent employment
    Temporary work
    Casual work
    Work at office
    Remote work
    Flexible hours
    Shift work

    CoreWeave

    Sunnyvale, CA
    2 days ago
  • $165k - $242k

     ...Apply for the Senior Software Engineer II, Inference role at CoreWeave. CoreWeave is The Essential Cloud for AI™. Built for pioneers by pioneers, CoreWeave delivers a platform of technology, tools, and teams that enables innovators to build and scale AI with confidence... 
    Senior
    Permanent employment
    Temporary work
    Casual work
    Work at office
    Remote work
    Flexible hours
    Shift work

    CoreWeave

    Sunnyvale, CA
    2 days ago
  • $139k - $204k

     ...Senior Software Engineer I, Inference Sunnyvale, CA / Bellevue, WA CoreWeave is The Essential Cloud for AI™. Built for pioneers by pioneers, CoreWeave delivers a platform of technology, tools, and teams that enables innovators to build and scale AI with confidence.... 
    Senior
    Permanent employment
    Temporary work
    Casual work
    Work at office
    Remote work
    Flexible hours
    Shift work

    CoreWeave

    Sunnyvale, CA
    12 hours ago
  • $152k - $241.5k

     ...NVIDIA is the platform upon which every new AI‑powered application is built. We are seeking a Senior Software Engineer – AI Inference to advance open‑source LLM serving by contributing directly to upstream inference engines like vLLM and SGLang-ensuring they run best‑in... 
    Senior

    NVIDIA Gruppe

    Santa Clara, CA
    3 days ago
  • $152k - $241.5k

     ...Senior Software Engineer – Deep Learning Inference What you’ll be doing: Craft and develop robust inferencing software that can be scaled to multiple platforms for functionality and performance Develop components of TensorRT, NVIDIA’s SDK for high-performance deep learning... 
    Senior

    NVIDIA Gruppe

    Santa Clara, CA
    2 days ago
  •  ...RL training and SOTA LLM and Multimodal inference at scale across multi-GPU and multi-node...  ...You will collaborate across internal GPU software teams and engage with open-source...  ...THE PERSON: Skilled engineer with strong technical and analytical expertise... 
    Senior

    Advanced Micro Devices , Inc.

    Santa Clara, CA
    4 days ago
  •  ...your career. THE ROLE: As a senior member of the LLM inference framework team, you will be responsible...  ...at the intersection of inference engines, distributed systems, and GPU runtime...  ...and kernel development Software Engineering ~ Expertise in Python... 
    Senior

    Advanced Micro Devices , Inc.

    Santa Clara, CA
    4 days ago
  • $184k - $287.5k

    NVIDIA seeks a Senior Software Engineer specializing in Deep Learning Inference for our growing team. As a key contributor, you will help design, build, and optimize the GPU-accelerated software that powers today’s most sophisticated AI applications. Our team is responsible... 
    Senior

    NVIDIA Gruppe

    Santa Clara, CA
    12 hours ago
  • $184k - $287.5k

    Position Overview We are seeking highly skilled and motivated software engineers to join us and build AI inference systems that serve large-scale models with extreme efficiency. You’ll architect and implement high-performance inference stacks, optimize GPU kernels and... 
    Senior

    NVIDIA Gruppe

    Santa Clara, CA
    1 day ago
  •  ...Advanced Micro Devices is seeking a strategic software engineering lead in Santa Clara, California. This role involves improving application...  .... Key responsibilities include developing techniques for inference optimization and supporting the ROCm ecosystem expansion. A... 
    Senior

    Advanced Micro Devices , Inc.

    Santa Clara, CA
    2 days ago
  • $152k - $241.5k

     ...NVIDIA Gruppe is seeking a Senior Software Engineer – AI Inference in Santa Clara, California. This role involves enhancing open-source LLM serving optimizations and implementing high-performance runtime capabilities. Candidates should have 5+ years of experience in building... 
    Senior

    NVIDIA Gruppe

    Santa Clara, CA
    2 days ago
  • A leading technology company is seeking a Senior System Software Engineer to develop GPU-accelerated AI inference serving software. The ideal candidate will have over 5 years of experience with deep learning software, strong skills in Rust and C++, and a collaborative approach... 
    Senior

    NVIDIA Corporation

    Santa Clara, CA
    2 days ago
  • $165k - $242k

     ...Nasdaq: CRWV) in March 2025. Learn more at What You'll Do: Senior engineers are area owners who lead designs, raise engineering...  ...orchestration, and hardware teams to evolve our Kubernetes-native inference platform and meet strict P99 SLAs at scale. About the role... 
    Senior
    Permanent employment
    Temporary work
    Casual work
    Work at office
    Flexible hours
    Shift work

    CoreWeave

    Sunnyvale, CA
    20 days ago
  • $152k - $241.5k

     ...seeking a talented individual to optimize and benchmark GenAI inference using the latest acceleration technologies. The role involves...  ...Required qualifications include a relevant degree and significant software development experience in Python or C++. A deep understanding... 
    Senior

    NVIDIA Gruppe

    Santa Clara, CA
    1 day ago
  • $184k - $287.5k

    NVIDIA Gruppe in Santa Clara is seeking an AI Systems Engineer to innovate and develop cutting-edge technologies in the AI inference software stack. Candidates should hold a Master's degree and possess over 6 years of experience in ML/DL systems development. The role involves... 
    Senior

    NVIDIA Gruppe

    Santa Clara, CA
    1 day ago
  • $184k - $287.5k

    NVIDIA Gruppe is seeking talented AI systems engineers to advance innovative technologies in AI inference systems software. This role involves developing cutting-edge libraries, code generators, and kernel technologies for NVIDIA's architecture, emphasizing high-impact... 
    Senior

    NVIDIA Gruppe

    Santa Clara, CA
    1 day ago
  • $119.8k - $234.7k

     ...to Fortune 500 enterprises. Ourconverged AI fabricdelivers inference capabilities for all LLMs inMicrosoft catalog, including OpenAI,Anthropic,Mistral, Cohere, Llama, and more. As a Senior Software Engineer , you will shape the future of one of thelargest and fastest... 
    Senior
    Ongoing contract
    Local area

    Microsoft Corporation

    Mountain View, CA
    4 days ago
  • $184k - $356.5k

    NVIDIA Gruppe is looking for skilled software engineers to develop AI inference systems that operate with high efficiency. The role involves architecting high-performance inference frameworks and optimizing GPU processes. Ideal candidates should have extensive programming... 
    Senior

    NVIDIA Gruppe

    Santa Clara, CA
    1 day ago
  • $230k - $250k

    Cerebras Systems is seeking a Sr. Member of Technical Staff in Sunnyvale, CA. This role involves designing resilient software features for cloud-based AI inference, leveraging AWS tools and services. Candidates should have a Master’s degree in Computer Science and experience... 
    Senior

    Cerebras Systems

    Sunnyvale, CA
    12 hours ago
  • NVIDIA Gruppe is seeking a Senior System Software Engineer in Santa Clara, California, to develop world-class GPU-accelerated AI inference serving software. This role involves contributing to feature development and optimizing software for deployment in production environments... 
    Senior

    NVIDIA Gruppe

    Santa Clara, CA
    1 day ago
  •  ...We’re looking for a Senior Engineer to help build the next-generation inference platform that supports embedding models used for semantic search, retrieval, and...  ...backend or infrastructure systems at scale ~ Strong software engineering skills in languages such as Go, Rust,... 
    Senior
    Local area
    Worldwide

    MongoDB

    Palo Alto, CA
    2 days ago
  • General Motors is seeking a Senior ML Infrastructure Engineer to build and scale a robust platform for machine learning inference workflows. You will design backend software components, collaborate with ML engineers, and lead initiatives across GM's ML ecosystem. With over... 
    Senior
    Remote job

    General Motors

    Sunnyvale, CA
    3 days ago
  • $152k - $241.5k

     ...Design, build, and optimize containerized inference execution for the latest 3D VLMs from...  ...production‑grade, highly optimized software (NIMs, NVIDIA Inference Microservices)...  ...Computer Science + 3 years, or Electrical Engineering, Bachelor of Science (or equivalent experience... 
    Senior

    NVIDIA Gruppe

    Santa Clara, CA
    2 days ago
  •  ...limits of real‑time large language model inference? Join NVIDIA’s TensorRT Edge‑LLM team...  ...automotive and robotics. We build the software stack that enables Large Language, Vision...  ...Computer Science, Electrical/Computer Engineering, or a closely related field. 4+ years of... 
    Senior

    NVIDIA Gruppe

    Santa Clara, CA
    2 days ago
  • $184k - $287.5k

     ...the core of modern AI infrastructure, from training large-scale models to running inference in production. That position depends on software as much as hardware, and compiler engineering is a big part of what makes it work. What you’ll be doing: Designing and implementing... 
    Senior
    Work experience placement

    NVIDIA Gruppe

    Santa Clara, CA
    2 days ago
  • $128.7k - $261.3k

    The Model Deployment & Inference Solutions team in GM AV deploys machine learning models from training frameworks...  ...GitHub Copilot, or equivalent) as part of your engineering workflow. Experience designing clean, well-tested software with clear interfaces and good abstractions.... 
    Senior
    Flexible hours

    General Motors

    Sunnyvale, CA
    4 days ago
  • $139k - $204k

     ...training clusters, agent building, and inference at scale, we’re combining forces to serve...  ...problems at the intersection of software, hardware, and AI, there\'s never been...  ...What You’ll Do Reporting to the Engineering Manager for ML Verticals, you will help... 
    Senior
    Permanent employment
    Temporary work
    Casual work
    Work at office
    Remote work
    Flexible hours

    Weights & Biases

    Sunnyvale, CA
    2 days ago
  • $182k - $242k

     ...maintaining benchmarking workflows for end-to-end MLPerf Training and Inference runs, including workload setup, cluster configuration, and...  ...to architecture decisions within the team. Break down engineering tasks into clear milestones and deliver reliable, high-quality... 
    Senior
    Permanent employment
    Temporary work
    Casual work
    Work at office
    Remote work
    Flexible hours

    CoreWeave

    Sunnyvale, CA
    2 days ago
  •  ...backend services and APIs that support model inference, orchestration, and tool execution....  ...into working features. Collaborate with ML engineers to integrate, evaluate, and...  ...Qualifications 3-5 years of professional software engineering experience. Strong experience... 
    Senior

    Nutanix

    Santa Clara, CA
    2 days ago

Do you want to receive more vacancies?

Subscribe and receive similar vacancies to Senior Software Engineer, Quantized Inference. Be the first to apply!