Sign up to access all features of our service.
  • Job search
  • Favorites
  • Create a CV
    New
  • Salaries
  • Subscriptions

Senior Software Development Engineer - LLM Inference Framework

Advanced Micro Devices , Inc.

What You Do At AMD Changes Everything

At AMD, our mission is to build great products that accelerate next-generation computing experiences—from AI and data centers, to PCs, gaming and embedded systems. Grounded in a culture of innovation and collaboration, we believe real progress comes from bold ideas, human ingenuity and a shared passion to create something extraordinary. When you join AMD, you'll discover the real differentiator is our culture. We push the limits of innovation to solve the world's most important challenges—striving for execution excellence, while being direct, humble, collaborative, and inclusive of diverse perspectives. Join us as we shape the future of AI and beyond.

Together, we advance your career.

The Role

As a senior member of the LLM inference framework team, you will be responsible for building and optimizing production-grade single-node and distributed inference runtimes for large language models on AMD GPUs. You will work at the framework and runtime layer, driving performance, scalability, and reliability, enabling tensor parallelism, pipeline parallelism, expert parallelism (MoE), and single-node or multi-node inference at scale. Your work will directly power customer-facing deployments and benchmarking platforms (e.g., InferenceMax, MLPerf, strategic partners, and cloud providers) and will be upstreamed into open-source inference frameworks such as vLLM and SGLang to make AMD a first-class platform for LLM serving.

This role sits at the intersection of inference engines, distributed systems, and GPU runtime and kernel backends.

The Person

You are a systems-minded ML engineer who thinks in terms of throughput, latency, memory movement, and scheduling, not just model code. You are comfortable reading and modifying large-scale inference frameworks, debugging performance across GPUs and nodes, and collaborating with kernel, compiler, and networking teams to close end-to-end performance gaps. You enjoy working in open source and driving architecture-level improvements in inference platforms.

Key Responsibilities

Inference Framework & Runtime

  • Architect and optimize distributed LLM inference runtimes based on in-house LLM engines or open-source stacks such as vLLM, SGLang, and llm-d
  • Design and improve TP / PP / EP (MoE) hybrid execution, including KV-cache management, attention dispatch, and token scheduling
  • Implement and optimize multi-node inference pipelines using RCCL, RDMA, and collective-based execution

Performance & Scalability

  • Drive throughput, latency, and memory efficiency across single-GPU and multi-GPU clusters
  • Optimize continuous batching, speculative decoding, KV-cache paging, prefix caching, and multi-turn serving

GPU & Backend Integration

  • Work with AMD GPU libraries (AITER, HIPBLAS-LT, RCCL, ROCm runtime) to ensure inference frameworks efficiently use FP8 / FP4 GEMM and FlashAttention / MLA
  • Collaborate with compiler teams (Triton, LLVM, ROCm) to unblock framework-level performance

Open Source & Customer Enablement

  • Upstream features and performance fixes into vLLM, SGLang, and llm-d
  • Enable customer PoCs and production deployments on AMD platforms
  • Build and maintain benchmark-grade inference pipelines
Preferred Experience

Inference Stack Knowledge

  • Hands-on understanding of vLLM, SGLang, or similar inference stacks
  • Experience with distributed inference scaling and a proven track record of contributing to upstream open-source projects

Deep Learning Integration

  • Strong experience integrating optimized GPU performance into machine-learning frameworks (e.g., PyTorch, TensorFlow) for high-throughput and scalable inference

Kernel & Inference Frameworks

  • Strong background in NVIDIA, AMD, or similar GPU architectures and kernel development

Software Engineering

  • Expertise in Python and preferably experience in C/C++, including debugging, performance tuning, and test design for large-scale systems

High-Performance Computing

  • Experience running large-scale workloads on heterogeneous GPU clusters, optimizing for efficiency and scalability

Compiler & Runtime Optimization

  • Understanding of compiler and runtime systems, including LLVM, ROCm, and GPU code generation
Academic Credentials
  • Master's or PhD in Computer Science, Computer Engineering, Electrical Engineering, or a related field.
Vacancy posted 13 hours ago
Similar jobs that could be interesting for youBased on the Senior Software Development Engineer - LLM Inference Framework in Santa Clara, CA vacancy
  •  ...developing deep learning frameworks for AMD GPUs. Your...  ...training and SOTA LLM and Multimodal inference at scale across...  ...across internal GPU software teams and engage with...  ...Skilled engineer with strong technical...  ...Triton, TileLang or DSL development within Linux environments... 
    Senior

    Advanced Micro Devices , Inc.

    Santa Clara, CA
    1 day ago
  • $184k - $356.5k

     ...California is seeking a Senior Deep Learning Software Engineer focused on performance optimization of LLM models. You will...  ...analyze and enhance LLM inference performance, working...  ...extensive software development experience with deep learning frameworks such as PyTorch and... 
    Senior
    Full time

    NVIDIA Corporation

    Santa Clara, CA
    1 day ago
  • $152k - $241.5k

     ...large language model inference? Join NVIDIA's TensorRT Edge-LLM team and help...  ...robotics. We build the software stack that enables...  ...the-art inference framework in modern C++ that...  ...and operator development for critical transformer...  .../Computer Engineering, or a closely related... 
    Senior
    Remote work

    NVIDIA

    Santa Clara, CA
    5 days ago
  • $165k - $242k

     ...Senior Software Engineer II, Inference Sunnyvale, CA / Bellevue, WA CoreWeave is The Essential Cloud for AI™. Built for pioneers by...  ...Preferred: Contributions to inference frameworks (vLLM, Triton, TensorRT-LLM, Ray Serve, TorchServe). Experience with CUDA... 
    Senior
    Permanent employment
    Temporary work
    Casual work
    Work at office
    Remote work
    Flexible hours
    Shift work

    CoreWeave

    Sunnyvale, CA
    2 days ago
  • $139k - $204k

     ...Senior Software Engineer I, Inference Sunnyvale, CA / Bellevue, WA CoreWeave is The Essential Cloud for AI™. Built for pioneers by...  ...Preferred: Contributions to inference frameworks (vLLM, Triton, TensorRT-LLM, Ray Serve, TorchServe). Experience with CUDA... 
    Senior
    Permanent employment
    Temporary work
    Casual work
    Work at office
    Remote work
    Flexible hours
    Shift work

    CoreWeave

    Sunnyvale, CA
    2 days ago
  • $152k - $241.5k

    Senior Software Engineer, Quantized Inference page is loaded## Senior Software Engineer, Quantized...  ...engines (vLLM, TRT-LLM, SGLang). The candidate...  ..., export) or equivalent framework* Experience reading, modifying...  ...) or Triton kernel development* Track record of debugging... 
    Senior

    NVIDIA Corporation

    Santa Clara, CA
    4 days ago
  • $152k - $241.5k

     ...talented and motivated engineers to join our...  ...leading deep learning inference software for NVIDIA AI...  .... As a Senior Software Engineer...  ...TensorRT and TensorRT-LLM to supercharge inference...  ...of software development experience on a...  ...developing Deep Learning Frameworks, Compilers, or... 
    Senior

    NVIDIA

    Santa Clara, CA
    2 days ago
  • $184k - $287.5k

     ...a skilled Agentic AI Software Engineer to join our team. The...  ...into leading agentic AI frameworks and open-source...  ...Day-0 NVIDIA model and inference support in agent orchestration...  ...in software design, development, and testing ~...  ...building with LLM-based agent frameworks... 
    Senior

    NVIDIA

    Santa Clara, CA
    4 days ago
  • $152k - $241.5k

     ...We are seeking a Senior Software Engineer to drive integration of...  ...leading open-source AI frameworks. In this role, you...  ...such as Dynamo, llm-d, Ray, PyTorch, and...  ...across training and inference stacks. Partner with...  ...AI workloads: model development basics, training vs.... 
    Senior
    Remote work

    NVIDIA

    Santa Clara, CA
    6 days ago
  • $184k - $287.5k

     ...skilled and motivated software engineers to join us and build AI inference systems that serve large...  ...optimize the inference framework (vLLM) with methods like...  ...and optimizing LLM inference engines (e.g....  ...advance AI research and development to create groundbreaking... 
    Senior

    NVIDIA

    Santa Clara, CA
    3 days ago
  • $184k - $287.5k

    We are now looking for a Senior Deep Learning Software Engineer, LLM Performance! NVIDIA is...  ...the performance of LLM inference. NVIDIA is rapidly growing our research and development for Deep Learning Inference...  ...in NVIDIA/OSS LLM frameworks. Scale performance of LLM... 
    Senior

    NVIDIA

    Santa Clara, CA
    3 days ago
  • $184k - $287.5k

    Senior Software Engineer, RL Post-Training Frameworks page is loaded## Senior Software Engineer, RL Post-Training Frameworkslocations...  ...in the field. RL requires inference, rollout generation, and training...  ...:*** Reinforcement learning for LLM post-training (RLHF, PPO, GRPO,... 
    Senior

    NVIDIA Corporation

    Santa Clara, CA
    1 day ago
  • $184k - $287.5k

    Senior Deep Learning Software Engineer, Inference page is loaded## Senior Deep Learning Software Engineer...  ...deep learning frameworks, including SGLang and vLLM...  ...improvements for state-of-the-art LLM and Generative AI models...  ...of relevant software development experience.* Excellent C... 
    Senior

    NVIDIA Corporation

    Santa Clara, CA
    5 days ago
  • $193.3k - $261.5k

     ...AWS Neuron, the software development kit used to accelerate...  ...and application framework that seamlessly...  ...unparalleled ML inference and training performance...  ...boundary, our engineers build systematic...  ...wide variety of LLM model families,...  ...mentorship. Our senior members enjoy one... 
    Senior
    Work experience placement
    Internship
    Local area
    Flexible hours

    Amazon

    Cupertino, CA
    4 days ago
  • $184k - $287.5k

     ...Senior Engineer, NIM Platform SDK and Microservice Framework NVIDIA is the platform for every...  ...-ready AI inference at scale. This is...  ...involves solving deep software engineering challenges...  ...API framework development. The role...  ...SGLang, TensorRT-LLM, Dynamo), middleware... 
    Senior

    NVIDIA

    Santa Clara, CA
    3 days ago
  • $128.7k - $261.3k

     ...Model Deployment & Inference Solutions team in...  ...from training frameworks (e.g. PyTorch) onto...  ...every ML model development team at GM.What you...  ...as part of your engineering workflow. Experience...  ..., well-tested software with clear interfaces...  ...agentic or LLM-powered developer... 
    Senior
    Flexible hours

    General Motors

    Sunnyvale, CA
    1 day ago
  • $199.7k - $254.6k

     ...AI Incubation Team as a Senior AI/MLDevOpsEngineer and help productionize LLM/SLM capabilities for...  ...collaborate with product and engineering teams to deploy...  ...SLMs, including on-prem inference packaging, runtime optimization...  ...role requires strong software engineering, hands-on... 
    Senior
    Full time
    Temporary work
    Local area
    Flexible hours

    Cisco

    San Jose, CA
    1 day ago
  • $152k - $241.5k

     ...We are now looking for a Senior Software Engineer for Deep Learning Inference! Would you like to make a big impact in...  ...build a state-of-the-art inference framework for accelerating Deep Learning...  ...field. ~3+ years of software development experience. ~ Strong experience... 
    Senior

    NVIDIA

    Santa Clara, CA
    2 days ago
  • $152k - $241.5k

     ...NVIDIA is the platform upon which every new AI‑powered application is built. We are seeking a Senior Software Engineer – AI Inference to advance open‑source LLM serving by contributing directly to upstream inference engines like vLLM and SGLang-ensuring they run best‑... 
    Senior
    Remote work

    NVIDIA

    Santa Clara, CA
    3 days ago
  • $152k - $241.5k

     ...We are looking for a Senior System Software Engineer to work on Dynamo-Triton Inference Server ( . NVIDIA is hiring...  ...Contribute to feature development and drive broad...  ...Large Language Model (LLM) and non-LLM workloads...  ...Prior experience with AI frameworks and engines, such as TensorRT... 
    Senior

    NVIDIA

    Santa Clara, CA
    3 days ago
  •  ...for an influential software engineer who is passionate...  ...: As a Senior Staff Software Developer...  ...accelerate the development and performance...  .../CUDA core of ML frameworks like PyTorch, TensorFlow...  ...the forefront of LLM advancements,...  ...) architectures, inference optimizations (e.... 

    Advanced Micro Devices , Inc.

    Santa Clara, CA
    1 day ago
  • $196.5k - $219.3k

     ...monitor and filter LLM requests/responses...  ...features. Mentor junior engineers on secure backend development and best practices...  ...of high-quality software features while adhering...  ...implementing model inference pipelines, fine-...  ...# Familiarity with Frameworks like PyTorch or... 
    Full time
    Worldwide

    Fortinet

    Sunnyvale, CA
    1 day ago
  • A leading technology company is seeking a Senior AI Software Engineer to join their team in Santa Clara, California. In...  ...develop groundbreaking AI systems software for inference applications including deep learning framework optimizations and GPU kernel technologies. You... 
    Senior

    NVIDIA Corporation

    Santa Clara, CA
    5 days ago
  • $152k - $241.5k

     ...NVIDIA's TensorRT team as a Senior Software Engineer, and be at the forefront...  ...enabling high-performance AI inference solutions for automotive...  ...: Lead the design and development of high-performance deep learning...  ...learning concepts and frameworks Experience with safety-... 
    Senior

    NVIDIA

    Santa Clara, CA
    5 days ago
  • $184k - $356.5k

    NVIDIA Corporation is seeking a Senior Deep Learning Software Engineer specializing in Inference to join their growing team in...  ...-performance deep learning frameworks like SGLang and vLLM. Candidates...  ...years of experience in software development. The position offers a base... 
    Senior

    NVIDIA Corporation

    Santa Clara, CA
    5 days ago
  • $184k - $287.5k

    NVIDIA is looking for a Senior Deep Learning Software Engineer in Santa Clara, California...  ...analyzing and improving LLM inference performance using NVIDIA...  ...have extensive software development experience, strong...  ...knowledge of deep learning frameworks. The position offers competitive... 
    Senior

    NVIDIA

    Santa Clara, CA
    3 days ago
  •  ...is looking for a strategic software engineering lead who is passionate about...  ...optimizing scale-up and scale-out inference. Develop methods and...  ...sglang, or vllm and with kserve, llm-d. Experience running...  ...in-lieu of experience with frameworks such as kserve or llm-d.... 

    Advanced Micro Devices , Inc.

    Santa Clara, CA
    5 days ago
  • $272k - $431.25k

     ...high-throughput, low-latency inference framework for serving generative AI and...  ...resilient deployment of cutting-edge LLM workloads. We are seeking a Principal Systems Engineer to define the vision and...  ...accelerators and memory pools. Mentor senior and junior engineers, set... 
    Local area
    Remote work

    NVIDIA

    Santa Clara, CA
    2 days ago
  • $236k - $339.25k

     ...machine learning and LLM workloads. Join us to...  ...and proactively with senior architects, PMs, and...  ...continuously improve ML development velocity and...  ...learning services, and frameworks. Strong track record...  ...in serving LLMs using inference engines like vLLM, TensorRT-LLM... 
    Senior
    Flexible hours

    Snowflake Computing

    Menlo Park, CA
    3 days ago
  • $152k - $241.5k

     ...are looking for a Senior DL Algorithms Engineer for LLM/Omni model optimizations...  ...of the hardware/software stack from GPU...  ...architecture to Deep Learning Framework to achieve peak...  ...’s accelerated inference SW stack. Contribute...  ...-heavy application development. Deep... 
    Senior

    NVIDIA

    Santa Clara, CA
    3 days ago

Do you want to receive more vacancies?

Subscribe and receive similar vacancies to Senior Software Development Engineer - LLM Inference Framework. Be the first to apply!