Sign up to access all features of our service.
  • Job search
  • Favorites
  • Create a CV
    New
  • Salaries
  • Subscriptions

Senior Software Engineer - AI Inference

$152k - $241.5k

NVIDIA

NVIDIA is the platform upon which every new AI‑powered application is built. We are seeking a Senior Software Engineer – AI Inference to advance open‑source LLM serving by contributing directly to upstream inference engines like vLLM and SGLang-ensuring they run best‑in‑class on NVIDIA GPUs and systems-and by improving the underlying stack that enables high‑throughput, low‑latency inference at scale.

This is a hands-on role for an engineer who enjoys digging into performance bottlenecks, designing pragmatic runtime improvements, and shipping high‑quality changes that are broadly useful to the community and production deployments.

What you'll be doing:

  • Contribute features, fixes, and optimizations upstream to vLLM/SGLang: author PRs, participate in reviews, write benchmarks/tests, and help drive designs to completion.

  • Implement and optimize inference‑runtime capabilities: batching and scheduling policies, streaming, request lifecycle management, and KV‑cache efficiency (paging/sharding) to improve throughput and tail latency.

  • Profile and improve hot paths across layers-from Python orchestration to C++/CUDA kernels-using data to guide optimization work.

  • Improve multi‑GPU inference performance and reliability: parallelism strategies, communication patterns, and resource utilization across NVIDIA platforms.

  • Build and maintain performance and correctness regression tests to prevent slowdowns and ensure stable behavior across model and hardware configurations.

  • Collaborate with model, platform, and SRE teams to translate production requirements into upstreamable solutions with strong operability and maintainability.

What we need to see:

  • 5+ years building production software with solid systems engineering fundamentals and a track record of delivering performance or reliability improvements.

  • Experience with LLM inference/serving stacks (e.g., vLLM, SGLang) and an understanding of the tradeoffs that drive real production performance.

  • Strong programming skills in Python plus C++ and/or CUDA; ability to debug and optimize performance‑critical code.

  • Experience with profiling and performance investigation (microbenchmarks, flame graphs, GPU profiling) and a measurement‑driven mindset.

  • Familiarity with distributed systems concepts and concurrency (queues/schedulers, multi‑process/multi‑threading, scaling across GPUs/nodes).

  • Strong communication skills and comfort working with open‑source communities (issues, PR discussions, code review).

  • BS/MS in Computer Science, Computer Engineering, or related field (or equivalent experience).

Ways to stand out from the crowd:

  • Open‑source contributions to vLLM, SGLang, PyTorch, Triton, NCCL, Dynamo or adjacent serving/runtime projects.

  • Shipped performance work such as improved attention/KV cache efficiency, speculative decoding, scheduler improvements, quantization-aware serving, or streaming latency reductions.

  • Experience building reproducible benchmarking and performance regression infrastructure for latency/throughput.

  • Systems performance background spanning memory bandwidth, kernel fusion, PCIe/NVLink effects, and network fabrics (e.g., InfiniBand).

We are widely considered to be one of the technology world’s most desirable employers. We have some of the most forward‑thinking and creative people in the world working for us. If you're creative and autonomous with a real passion for technology, we want to hear from you.

Your base salary will be determined based on your location, experience, and the pay of employees in similar positions. The base salary range is 152,000 USD - 241,500 USD for Level 3, and 184,000 USD - 287,500 USD for Level 4.

You will also be eligible for equity and benefits ( .

Applications for this job will be accepted at least until April 18, 2026.

This posting is for an existing vacancy.

NVIDIA uses AI tools in its recruiting processes.

NVIDIA is committed to fostering a diverse work environment and proud to be an equal opportunity employer. As we highly value diversity in our current and future employees, we do not discriminate (including in our hiring and promotion practices) on the basis of race, religion, color, national origin, gender, gender expression, sexual orientation, age, marital status, veteran status, disability status or any other characteristic protected by law.

Vacancy posted 3 days ago
Similar jobs that could be interesting for youBased on the Senior Software Engineer - AI Inference in United States vacancy
  •  ...computing experiences—from AI and data centers, to PCs,...  ...career. The Role As a senior member of the LLM inference framework team, you will...  ...intersection of inference engines, distributed systems, and...  ...kernel development Software Engineering ~ Expertise... 
    Senior

    Advanced Micro Devices , Inc.

    Santa Clara, CA
    21 hours ago
  •  ...NVIDIA seeks a Senior Software Engineer specializing in Deep Learning Inference for our growing team. As a key contributor, you will help design, build, and optimize...  ...accelerated software that powers today’s most sophisticated AI applications. Our team is responsible for... 
    Senior
    Remote work

    NVIDIA

    United States
    3 days ago
  • $152k - $241.5k

     ...innovation, driving advancements in AI and machine learning to solve...  ...seeking talented and motivated engineers to join our TensorRT team in developing...  ...industry-leading deep learning inference software for NVIDIA AI accelerators. As a Senior Software Engineer in the... 
    Senior

    NVIDIA

    Santa Clara, CA
    2 days ago
  •  ...generation computing experiences-from AI and data centers, to PCs,...  ...and SOTA LLM and Multimodal inference at scale across multi-GPU and...  ...across internal GPU software teams and engage with open-source...  ...THE PERSON: Skilled engineer with strong technical and... 
    Senior

    Advanced Micro Devices , Inc.

    Santa Clara, CA
    1 day ago
  • $152k - $241.5k

     ...We are now looking for a Senior Software Engineer for Deep Learning Inference! Would you like to make a big impact in Deep Learning by helping build a state-of...  ...This posting is for an existing vacancy. NVIDIA uses AI tools in its recruiting processes. NVIDIA is... 
    Senior

    NVIDIA

    Santa Clara, CA
    2 days ago
  • $168.1k - $227.4k

     ...Description AWS Neuron is the complete software stack for the AWS Inferentia and...  ...accelerators. This role is for a senior software engineer in the Machine Learning Inference Applications team. This role is...  ...on GPUs, Neuron, TPU or other AI acceleration hardware. Amazon... 
    Senior
    Work experience placement
    Flexible hours

    Amazon

    Seattle, WA
    4 days ago
  • $160k - $240k

     ...Senior Software Engineer - AI Inference Location New York Business Area Engineering and CTO Ref # 10050779 Description & Requirements Our team: Join the team that is building the core infrastructure for AI at Bloomberg. The Bloomberg AI Inference... 
    Senior
    Temporary work
    For contractors
    Work experience placement

    Bloomberg

    New York, NY
    2 days ago
  • $165k - $242k

     ...Senior Software Engineer II, Inference Sunnyvale, CA / Bellevue, WA CoreWeave is The Essential Cloud for AI™. Built for pioneers by pioneers, CoreWeave delivers a platform of technology, tools, and teams that enables innovators to build and scale AI with confidence... 
    Senior
    Permanent employment
    Temporary work
    Casual work
    Work at office
    Remote work
    Flexible hours
    Shift work

    CoreWeave

    Sunnyvale, CA
    2 days ago
  • $139k - $204k

     ...Senior Software Engineer I, Inference Sunnyvale, CA / Bellevue, WA CoreWeave is The Essential Cloud for AI™. Built for pioneers by pioneers, CoreWeave delivers a platform of technology, tools, and teams that enables innovators to build and scale AI with confidence.... 
    Senior
    Permanent employment
    Temporary work
    Casual work
    Work at office
    Remote work
    Flexible hours
    Shift work

    CoreWeave

    Sunnyvale, CA
    2 days ago
  • $133.65k - $220.68k

     ...At Red Hat, we believe the future of AI is open and we are on a mission to bring...  ...to every enterprise. The Red Hat AI Inference team accelerates AI for the enterprise...  ...deployments. We are seeking an experienced Senior Software Engineer to build and release the Red Hat AI... 
    Senior
    Permanent employment
    Full time
    Contract work
    Work experience placement
    Work at office
    Remote work
    Flexible hours

    Red Hat

    Boston, MA
    2 days ago
  • $193.3k - $261.5k

     ...builds AWS Neuron, the software development kit used to...  ...enabling unparalleled ML inference and training...  ...software boundary, our engineers build systematic infrastructure...  ...of what's possible in AI acceleration. As part...  ...sharing and mentorship. Our senior members enjoy one-on-... 
    Senior
    Work experience placement
    Internship
    Local area
    Flexible hours

    Amazon

    Cupertino, CA
    4 days ago
  • $184k - $287.5k

     ...We are seeking highly skilled and motivated software engineers to join us and build AI inference systems that serve large-scale models with extreme efficiency. You’ll architect and implement high-performance inference stacks, optimize GPU kernels and compilers, drive... 
    Senior

    NVIDIA

    Santa Clara, CA
    3 days ago
  • $152k - $241.5k

     ...learning and eager to work on cutting-edge AI technology for safety-critical applications? Join NVIDIA's TensorRT team as a Senior Software Engineer, and be at the forefront of technology, enabling high-performance AI inference solutions for automotive safety and other... 
    Senior

    NVIDIA

    Santa Clara, CA
    14 hours ago
  •  ...enterprise. To usher in this new era, we seek AI-native thinkers across every function...  ...collaboratively and proactively with senior architects, PMs, and team leadership. The...  .... Experience in serving LLMs using inference engines like vLLM, TensorRT-LLM, TEI, SGLang, and... 
    Senior

    Snowflake Computing

    Bellevue, WA
    14 hours ago
  • $160k - $250k

     ...Senior Backend Engineer, Inference Platform San Francisco About the Role Together AI is building the Inference Platform that brings the most advanced generative AI models...  ...a strong plus. ~ Familiarity with GPU software stacks (CUDA, Triton, NCCL) and HPC... 
    Senior
    Full time
    Local area

    Together AI

    San Francisco, CA
    2 days ago
  • $152k - $241.5k

     ...recently, GPU deep learning ignited modern AI - the next era of computing - with...  ...for an AI & Deep Learning Compiler Engineer. NVIDIA is hiring software engineers for its Deep Learning & AI...  ...has been the backbone of NVIDIA's inference engine, spanning across data centers... 
    Senior
    Remote work

    NVIDIA

    United States
    1 day ago
  • $126.1k - $261.9k

     ...leadership and building cutting-edge AI systems? Are you ready to drive innovation...  ...and edge computing? Join the Akamai Inference Cloud Team! The Akamai Inference...  .... Partner with the best As a Senior II Software Engineer Lead, you will be responsible for driving... 
    Senior
    Permanent employment
    Work experience placement
    Work at office
    Remote work
    Work from home
    Worldwide
    Flexible hours

    Akamai

    United States
    2 days ago
  • $160k - $240k

    Bloomberg L.P. in New York is seeking a Senior Software Engineer for AI Inference to design and build scalable infrastructure for machine learning applications. The ideal candidate will have over 5 years of software engineering experience, expertise in distributed systems... 
    Senior

    Bloomberg L.P.

    New York, NY
    1 day ago
  • $320k

     ...interpretable, and steerable AI systems. We want AI to be safe...  ...of committed researchers, engineers, policy experts, and business...  ...About the Role The Cloud Inference team scales and optimizes Claude...  ...Have significant software engineering experience, with... 
    Senior
    Work at office
    Visa sponsorship
    Flexible hours

    Anthropic

    San Francisco, CA
    14 hours ago
  • $320k

     ...Staff + Sr. Software Engineer, Cloud Inference Launch Engineering San Francisco, CA | Seattle, WA About Anthropic Anthropic's mission is to create reliable, interpretable, and steerable AI systems. We want AI to be safe and beneficial for our users and for society... 
    Senior
    Work at office
    Visa sponsorship
    Flexible hours

    Anthropic

    Seattle, WA
    14 hours ago
  • A leading technology company is seeking a Senior Software Engineer to develop the AI Inference Server. The role involves managing the full lifecycle of building and releasing system software across various architectures. You will work on cloud infrastructure and enhance... 
    Senior

    Red Hat, Inc.

    Boston, MA
    4 days ago
  • $300k

     ...Staff + Sr. Software Engineer, Inference San Francisco, CA | New York City, NY | Seattle, WA About Anthropic Anthropic's mission is to create reliable, interpretable, and steerable AI systems. We want AI to be safe and beneficial for our users and for society... 
    Senior
    Work at office
    Worldwide
    Visa sponsorship
    Flexible hours

    anthropic

    New York, NY
    2 days ago
  •  ...Staff + Sr. Software Engineer, Inference Deployment San Francisco, CA | New York City, NY | Seattle, WA About Anthropic Anthropic's mission is to create reliable, interpretable, and steerable AI systems. We want AI to be safe and beneficial for our users and for... 
    Senior
    Work at office
    Visa sponsorship
    Flexible hours
    Shift work

    anthropic

    Seattle, WA
    2 days ago
  • $163.5k - $212.4k

    NIO is seeking a Senior AI Inference Infrastructure Software Engineer in San Jose, CA, specializing in building scalable inference systems for large language and vision-language models. This role requires over 5 years of software development experience and strong skills... 
    Senior

    nio.com

    San Jose, CA
    1 day ago
  • $155.42k - $205.9k

     ...About the Team: The ML Inference Platform is part of the AV...  ...platform that powers GM's AI efforts. We're proud to serve...  ...Role: We are seeking a Senior ML Infrastructure engineer to help build and scale...  ...implement core platform backend software components. Collaborate... 
    Senior
    Local area
    Remote work
    Work from home
    Relocation
    Relocation package
    Flexible hours

    General Motors

    Austin, TX
    5 days ago
  •  ...Senior AI Engineer — Inference & Agent Systems Title: Applied AI Engineer — Inference & Agent Systems Location: United States Arcana is building AI agents that synthesize information across heterogeneous sources and deliver structured, reasoned answers in real... 
    Senior

    Arcana Analytics

    Washington DC
    2 days ago
  • $300k

     ...interpretable, and steerable AI systems. We want AI to be safe...  ...of committed researchers, engineers, policy experts, and business...  ...systems. About The Role Our Inference team builds and maintains the...  ...Qualifications Significant software engineering experience, particularly... 
    Senior
    Worldwide

    Menlo Ventures

    New York, NY
    3 days ago
  • $242k - $290k

     ...Model Optimization & Deployment Engineer The Perception team is pioneering the development...  ...kernels, and build highly concurrent inference code to ensure real-time, deterministic execution...  ...latency and maximize memory bandwidth on AI accelerators. Write production-level,... 
    Senior
    Temporary work
    Relocation package

    Zoox

    Seattle, WA
    2 days ago
  •  ...on infrastructure that powers AI applications worldwide? Join the Akamai Inference Cloud Team! The Akamai Inference...  ...Partner with the best As a Software Engineer II, you will build and maintain...  ...patterns while working with senior engineers building systems powering... 
    Permanent employment
    Work at office
    Remote work
    Work from home
    Worldwide
    Flexible hours

    Akamai

    United States
    1 day ago
  • $220k - $270k

     ...Senior Software Engineer USD $220,000 - $270,000 meaningful equity | New York | 5 days onsite Soda has partnered with an AI infrastructure company founded by leaders behind one of the earliest...  ...infrastructure problems around inference, orchestration, context evolution... 
    Senior

    SoDA

    New York, NY
    2 days ago

Do you want to receive more vacancies?

Subscribe and receive similar vacancies to Senior Software Engineer - AI Inference. Be the first to apply!