Sign up to access all features of our service.
  • Job search
  • Favorites
  • Create a CV
    New
  • Salaries
  • Subscriptions

Senior Software Engineer, AI Inference Systems

$184k - $287.5k

NVIDIA Gruppe

Position Overview We are seeking highly skilled and motivated software engineers to join us and build AI inference systems that serve large-scale models with extreme efficiency. You’ll architect and implement high-performance inference stacks, optimize GPU kernels and compilers, drive industry benchmarks, and scale workloads across multi-GPU, multi-node, and multi-cloud environments. You’ll collaborate across inference, compiler, scheduling, and performance teams to push the frontier of accelerated computing for AI. At NVIDIA, we believe artificial intelligence (AI) will fundamentally transform how people live and work. Our mission is to advance AI research and development to create groundbreaking technologies that enable anyone to harness the power of AI and benefit from its potential. Our team consists of experts in AI, systems and performance optimization. If you’re excited to build systems, kernels, and tools that make large-scale AI faster, more efficient, and easier to deploy, we’d love to hear from you. Responsibilities Contribute features to vLLM that empower the newest models with the latest NVIDIA GPU hardware features; profile and optimize the inference framework (vLLM) with methods like speculative decoding, data/tensor/expert/pipeline-parallelism, prefill-decode disaggregation. Develop, optimize, and benchmark GPU kernels (hand‑tuned and compiler‑generated) using techniques such as fusion, autotuning, and memory/layout optimization; build and extend high‑level DSLs and compiler infrastructure to boost kernel developer productivity while approaching peak hardware utilization. Define and build inference benchmarking methodologies and tools; contribute both new benchmark and NVIDIA’s submissions to the industry‑leading MLPerf Inference benchmarking suite. Architect the scheduling and orchestration of containerized large‑scale inference deployments on GPU clusters across clouds. Conduct and publish original research that pushes the Pareto frontier for the field of ML Systems; survey recent publications and find a way to integrate research ideas and prototypes into NVIDIA’s software products. Qualifications Bachelor’s degree (or equivalent experience) in Computer Science (CS), Computer Engineering (CE) or Software Engineering (SE) with 7+ years of experience; alternatively, Master’s degree in CS/CE/SE with 5+ years of experience; or PhD degree with the thesis and top‑tier publications in ML Systems, GPU architecture, or high‑performance computing. Strong programming skills in Python and C/C++; experience with Go or Rust is a plus; solid CS fundamentals: algorithms & data structures, operating systems, computer architecture, parallel programming, distributed systems, deep learning theories. Knowledgeable and passionate about performance engineering in ML frameworks (e.g., PyTorch) and inference engines (e.g., vLLM and SGLang). Familiarity with GPU programming and performance: CUDA, memory hierarchy, streams, NCCL; proficiency with profiling/debug tools (e.g., Nsight Systems/Compute). Experience with containers and orchestration (Docker, Kubernetes, Slurm); familiarity with Linux namespaces and cgroups. Excellent debugging, problem‑solving, and communication skills; ability to excel in a fast‑paced, multi‑functional setting. Desired Experience Experience building and optimizing LLM inference engines (e.g., vLLM, SGLang). Hands‑on work with ML compilers and DSLs (e.g., Triton, TorchDynamo/Inductor, MLIR/LLVM, XLA), GPU libraries (e.g., CUTLASS) and features (e.g., CUDA Graph, Tensor Cores). Experience contributing to containerization/virtualization technologies such as containerd/CRI‑O/CRIU. Experience with cloud platforms (AWS/GCP/Azure), infrastructure as code, CI/CD, and production observability. Contributions to open‑source projects and/or publications; please include links to GitHub pull requests, published papers and artifacts. Compensation & Benefits Your base salary will be determined based on your location, experience, and the pay of employees in similar positions. The base salary range is 184,000 USD - 287,500 USD for Level 4, and 224,000 USD - 356,500 USD for Level 5. You will also be eligible for equity and benefits. Applications for this job will be accepted at least until May2,2026. This posting is for an existing vacancy. EEO Statement NVIDIA is committed to fostering a diverse work environment and proud to be an equal opportunity employer. As we highly value diversity in our current and future employees, we do not discriminate (including in our hiring and promotion practices) on the basis of race, religion, color, national origin, gender, gender expression, sexual orientation, age, marital status, veteran status, disability status or any other characteristic protected by law. #J-18808-Ljbffr

Vacancy posted 20 hours ago
Similar jobs that could be interesting for youBased on the Senior Software Engineer, AI Inference Systems in Santa Clara, CA vacancy
  • $184k - $356.5k

    NVIDIA Gruppe is looking for skilled software engineers to develop AI inference systems that operate with high efficiency. The role involves architecting high-performance inference frameworks and optimizing GPU processes. Ideal candidates should have extensive programming... 
    Senior

    NVIDIA Gruppe

    Santa Clara, CA
    4 days ago
  •  ...NVIDIA Gruppe is seeking a Senior System Software Engineer in Santa Clara, California, to develop world-class GPU-accelerated AI inference serving software. This role involves contributing to feature development and optimizing software for deployment in production environments... 
    Senior

    NVIDIA Gruppe

    Santa Clara, CA
    19 hours ago
  • $184k - $287.5k

    NVIDIA Gruppe is seeking talented AI systems engineers to advance innovative technologies in AI inference systems software. This role involves developing cutting-edge libraries, code generators, and kernel technologies for NVIDIA's architecture, emphasizing high-impact... 
    Senior

    NVIDIA Gruppe

    Santa Clara, CA
    4 days ago
  • $152k - $241.5k

     ...Senior Software Engineer, Quantized Inference page is loaded## Senior Software Engineer, Quantized Inferencelocations...  ...across the team: CI, build systems, training infrastructure, pipeline...  ...concise, well-tested code; fluent with AI-assisted tooling* Experience with ML... 
    Senior

    NVIDIA

    Santa Clara, CA
    20 hours ago
  • $152k - $241.5k

     ...We are looking for a Senior System Software Engineer to work on. NVIDIA is hiring software engineers for...  ...using GPUs to power a revolution in AI, enabling breakthroughs in problems from...  ...team building a highly-performant AI inference platform to make design and deployment... 
    Senior

    NVIDIA

    Santa Clara, CA
    20 hours ago
  • $152k - $241.5k

     ...NVIDIA is the platform upon which every new AI‑powered application is built. We are seeking a Senior Software Engineer – AI Inference to advance open‑source LLM serving by...  ...they run best‑in‑class on NVIDIA GPUs and systems-and by improving the underlying stack that... 
    Senior

    NVIDIA Gruppe

    Santa Clara, CA
    1 day ago
  •  ...computing experiences—from AI and data centers, to PCs, gaming and embedded systems. Grounded in a culture...  ...LLM and Multimodal inference at scale across multi-GPU...  ...across internal GPU software teams and engage with open...  .... THE PERSON: Skilled engineer with strong technical and... 
    Senior

    Advanced Micro Devices , Inc.

    Santa Clara, CA
    20 hours ago
  • $152k - $241.5k

     ...driving advancements in AI and machine learning to...  ...talented and motivated engineers to join our TensorRT...  ...-leading deep learning inference software for NVIDIA AI accelerators. As a Senior Software Engineer in the...  ...Frameworks, Compilers, or System Software. ~ Excellent... 
    Senior

    NVIDIA

    Santa Clara, CA
    3 days ago
  • $152k - $241.5k

     ...We are now looking for a Senior Software Engineer for Deep Learning Inference! Would you like to make a big impact in Deep...  ...the crowd:Experience developing System Software.Proficiency in Python as...  ...for an existing vacancy.NVIDIA uses AI tools in its recruiting processes.... 
    Senior

    NVIDIA

    Santa Clara, CA
    20 hours ago
  •  ...computing experiences-from AI and data centers, to...  ...gaming and embedded systems. Grounded in a...  ...THE ROLE: As a senior member of the LLM inference framework team, you...  ...intersection of inference engines, distributed systems...  ...development Software Engineering ~... 
    Senior

    Advanced Micro Devices , Inc.

    Santa Clara, CA
    2 days ago
  •  ...Advanced Micro Devices is seeking a strategic software engineering lead in Santa Clara, California. This role involves improving application...  .... Key responsibilities include developing techniques for inference optimization and supporting the ROCm ecosystem expansion. A... 
    Senior

    Advanced Micro Devices , Inc.

    Santa Clara, CA
    20 hours ago
  • $165k - $242k

     ...Senior Software Engineer II, Inference Sunnyvale, CA / Bellevue, WA CoreWeave is The Essential Cloud for AI™. Built for pioneers by pioneers, CoreWeave delivers a platform of technology...  ...experience building distributed systems or cloud services. ~ Strong coding... 
    Senior
    Permanent employment
    Temporary work
    Casual work
    Work at office
    Remote work
    Flexible hours
    Shift work

    CoreWeave

    Sunnyvale, CA
    3 days ago
  • $139k - $204k

     ...Senior Software Engineer I, Inference Sunnyvale, CA / Bellevue, WA CoreWeave is The Essential Cloud for AI™. Built for pioneers by pioneers, CoreWeave delivers a platform of technology...  ...experience building distributed systems or cloud services. Computer Science... 
    Senior
    Permanent employment
    Temporary work
    Casual work
    Work at office
    Remote work
    Flexible hours
    Shift work

    CoreWeave

    Sunnyvale, CA
    3 days ago
  • $168k - $270.25k

     ...Senior Software Engineer, Distributed Systems - NIM Factory page is loaded## Senior Software Engineer, Distributed...  ...the platform upon which every new AI-powered application is built. We are...  ...infrastructure and automation for NVIDIA Inference Microservices (NIMs). The right... 
    Senior
    Remote work

    NVIDIA

    Santa Clara, CA
    20 hours ago
  • $152k - $241.5k

     ...NVIDIA is seeking a highly motivated Software Engineer to join our growing AI and Generative AI engineering team....  ..., and evaluation of large-scale AI systems powering next‑generation...  ...infrastructure for large‑scale ML training, inference, and generative AI systems. Build distributed... 
    Senior

    NVIDIA Gruppe

    Santa Clara, CA
    19 hours ago
  • $248.71k - $292.6k

    About Groq Groq delivers fast, efficient AI inference. Our LPU-based system powers GroqCloud™, giving businesses and developers the speed and scale...  ...reach, anything is possible. Build fast. Sr. Staff Software Engineer - High Performance GPU Inference Systems Mission Push... 
    Senior

    I did my part and supported the Regular Toilet

    Palo Alto, CA
    4 days ago
  • $165k - $242k

     ...CoreWeave is The Essential Cloud for AI™. Built for pioneers by...  ...more at What You'll Do: Senior engineers are area owners who lead...  ...evolve our Kubernetes-native inference platform and meet strict P99...  ...experience building distributed systems or cloud services. ~ Strong... 
    Senior
    Permanent employment
    Temporary work
    Casual work
    Work at office
    Flexible hours
    Shift work

    CoreWeave

    Sunnyvale, CA
    18 days ago
  • $170.6k - $261.3k

     ...hardware and battery systems to intuitive design, intelligent software, and next-generation safety...  ...scale. Our Embodied AI teams are redefining...  ...to a safe stop. As a Senior Software Engineer on the Secondary Driving...  .../accelerator‑based ML inference, model deployment, and... 
    Senior
    Remote work
    Relocation package
    Flexible hours

    General Motors

    Sunnyvale, CA
    20 hours ago
  • $152k - $241.5k

     ...eager to work on cutting-edge AI technology for safety-...  ...NVIDIA's TensorRT team as a Senior Software Engineer, and be at the forefront of...  ...enabling high-performance AI inference solutions for automotive safety...  ...of functions, classes, and systems to support certification and... 
    Senior

    NVIDIA

    Santa Clara, CA
    1 day ago
  •  ...generation computing experiences-from AI and data centers, to PCs, gaming and embedded systems. Grounded in a culture of...  ...is looking for a strategic software engineering lead who is passionate about improving...  ...scale-up and scale-out inference. Develop methods and... 

    Advanced Micro Devices , Inc.

    Santa Clara, CA
    1 day ago
  • $384k

     ...NVIDIA is seeking a Senior Director, System Software Engineering, to lead strategy and execution for capacity management...  ...foundation for NVIDIA's internal AI research clusters. This leader will...  ..., or large-scale training and inference environments. Experience leading... 
    Senior

    NVIDIA

    Santa Clara, CA
    2 days ago
  •  ...computing experiences-from AI and data centers, to...  ...gaming and embedded systems. Grounded in a...  ...is looking for a Senior Staff AI Infra Engineer who is passionate about...  ...of hardware and software to optimize performance...  ...accelerate LLM training and inference on AMD GPUs,... 

    Advanced Micro Devices , Inc.

    Santa Clara, CA
    5 days ago
  • $230k - $250k

    Cerebras Systems is seeking a Sr. Member of Technical Staff in Sunnyvale, CA. This role involves designing resilient software features for cloud-based AI inference, leveraging AWS tools and services. Candidates should have a Master’s degree in Computer Science and experience... 
    Senior

    Cerebras Systems

    Sunnyvale, CA
    3 days ago
  • $152k - $241.5k

     ...passionate about redefining how software is built in the age of Generative AI? Join NVIDIA’s TensorRT team...  ...entry point for out-of-framework inference globally. We are moving beyond...  ...scale. If you are a systems‑thinking C++ engineer who wants to help scale out an... 
    Senior

    NVIDIA Gruppe

    Santa Clara, CA
    20 hours ago
  • $184k - $287.5k

     ...NVIDIA Gruppe in Santa Clara is seeking an AI Systems Engineer to innovate and develop cutting-edge technologies in the AI inference software stack. Candidates should hold a Master's degree and possess over 6 years of experience in ML/DL systems development. The role... 
    Senior

    NVIDIA Gruppe

    Santa Clara, CA
    20 hours ago
  •  ...A leading technology company is seeking a Senior AI Software Engineer to join their team in Santa Clara, California. In this role, you will innovate and develop groundbreaking AI systems software for inference applications including deep learning framework optimizations... 
    Senior

    NVIDIA

    Santa Clara, CA
    1 day ago
  • $184k - $287.5k

     ...NVIDIA seeks a Senior Software Engineer specializing in Deep Learning Inference for our growing team. As a key contributor, you will help design, build, and optimize...  ...accelerated software that powers today’s most sophisticated AI applications. Our team is responsible for... 
    Senior

    NVIDIA Gruppe

    Santa Clara, CA
    19 hours ago
  • $181.1k - $318.4k

     ...Senior Software Engineer, On-Device Health Agentic Systems Cupertino, California, United States Hardware We are seeking a senior...  ...primary interface for on‑device AI. A key part of your role will be...  ...the on‑device components for LLM inference, context management, and tool... 
    Senior
    Relocation

    Apple

    Cupertino, CA
    19 hours ago
  • Acceler8 Talent is seeking a Senior / Principal Machine Learning Engineer specializing in inference serving frameworks. This role involves...  ...multi-node inference systems, enhancing resource scheduling...  ...environment to work on cutting-edge AI infrastructure. #J-18808-Ljbffr... 
    Senior

    Acceler8 Talent

    Santa Clara, CA
    1 day ago
  •  ...Time · Department: Backend Engineer · Work type: On-Site About A rchetype AI Archetype AI is...  ...and resilient distributed systems. You’ll work closely with...  ...throughput, low-latency AI model inference and data services....  ...+ years of professional software engineering experience,... 
    Senior
    Full time

    Neara

    Palo Alto, CA
    19 hours ago

Do you want to receive more vacancies?

Subscribe and receive similar vacancies to Senior Software Engineer, AI Inference Systems. Be the first to apply!