Sign up to access all features of our service.
  • Job search
  • Favorites
  • Create a CV
    New
  • Salaries
  • Subscriptions

Senior Software Development Engineer - LLM Inference Framework

Advanced Micro Devices , Inc.

WHAT YOU DO AT AMD CHANGES EVERYTHING


At AMD, our mission is to build great products that accelerate next-generation computing experiences-from AI and data centers, to PCs, gaming and embedded systems. Grounded in a culture of innovation and collaboration, we believe real progress comes from bold ideas, human ingenuity and a shared passion to create something extraordinary. When you join AMD, you'll discover the real differentiator is our culture. We push the limits of innovation to solve the world's most important challenges-striving for execution excellence, while being direct, humble, collaborative, and inclusive of diverse perspectives. Join us as we shape the future of AI and beyond. Together, we advance your career.

THE ROLE:

As a senior member of the LLM inference framework team, you will be responsible for building and optimizing production-grade single-node and distributed inference runtimes for large language models on AMD GPUs. You will work at the framework and runtime layer, driving performance, scalability, and reliability, enabling tensor parallelism, pipeline parallelism, expert parallelism (MoE), and single-node or multi-node inference at scale. Your work will directly power customer-facing deployments and benchmarking platforms (e.g., InferenceMax, MLPerf, strategic partners, and cloud providers) and will be upstreamed into open-source inference frameworks such as vLLM and SGLang to make AMD a first-class platform for LLM serving.


This role sits at the intersection of inference engines, distributed systems, and GPU runtime and kernel backends.


THE PERSON :

You are a systems-minded ML engineer who thinks in terms of throughput, latency, memory movement, and scheduling, not just model code.


You are comfortable reading and modifying large-scale inference frameworks, debugging performance across GPUs and nodes, and collaborating with kernel, compiler, and networking teams to close end-to-end performance gaps.


You enjoy working in open source and driving architecture-level improvements in inference platforms.


KEY RESPONSIBILITIES:

Inference Framework & Runtime

  • Architect and optimize distributed LLM inference runtimes based on in-house LLM engines or open-source stacks such as vLLM, SGLang, and llm-d
  • Design and improve TP / PP / EP (MoE) hybrid execution, including KV-cache management, attention dispatch, and token scheduling
  • Implement and optimize multi-node inference pipelines using RCCL, RDMA, and collective-based execution
Performance & Scalability
  • Drive throughput, latency, and memory efficiency across single-GPU and multi-GPU clusters
  • Optimize continuous batching, speculative decoding, KV-cache paging, prefix caching, and multi-turn serving
GPU & Backend Integration
  • Work with AMD GPU libraries (AITER, HIPBLAS-LT, RCCL, ROCm runtime) to ensure inference frameworks efficiently use FP8 / FP4 GEMM and FlashAttention / MLA
  • Collaborate with compiler teams (Triton, LLVM, ROCm) to unblock framework-level performance
Open Source & Customer Enablement
  • Upstream features and performance fixes into vLLM, SGLang, and llm-d
  • Enable customer PoCs and production deployments on AMD platforms
  • Build and maintain benchmark-grade inference pipelines
PREFERRED EXPERIENCE:


Inference Stack Knowledge
  • Hands-on understanding of vLLM, SGLang, or similar inference stacks
  • Experience with distributed inference scaling and a proven track record of contributing to upstream open-source projects
Deep Learning Integration
  • Strong experience integrating optimized GPU performance into machine-learning frameworks (e.g., PyTorch, TensorFlow) for high-throughput and scalable inference
Kernel & Inference Frameworks
  • Strong background in NVIDIA, AMD, or similar GPU architectures and kernel development
Software Engineering
  • Expertise in Python and preferably experience in C/C++, including debugging, performance tuning, and test design for large-scale systems
High-Performance Computing
  • Experience running large-scale workloads on heterogeneous GPU clusters, optimizing for efficiency and scalability
Compiler & Runtime Optimization
  • Understanding of compiler and runtime systems, including LLVM, ROCm, and GPU code generation
ACADEMIC CREDENTIALS:
  • Master's or PhD in Computer Science, Computer Engineering, Electrical Engineering, or a related field.


#LI-JG1

Benefits offered are described: AMD benefits at a glance.

AMD does not accept unsolicited resumes from headhunters, recruitment agencies, or fee-based recruitment services. AMD and its subsidiaries are equal opportunity, inclusive employers and will consider all applicants without regard to age, ancestry, color, marital status, medical condition, mental or physical disability, national origin, race, religion, political and/or third-party affiliation, sex, pregnancy, sexual orientation, gender identity, military or veteran status, or any other characteristic protected by law. We encourage applications from all qualified candidates and will accommodate applicants' needs under the respective laws throughout all stages of the recruitment and selection process.

AMD may use Artificial Intelligence to help screen, assess or select applicants for this position. AMD's "Responsible AI Policy" is available here.

This posting is for an existing vacancy.
Vacancy posted 3 days ago
Similar jobs that could be interesting for youBased on the Senior Software Development Engineer - LLM Inference Framework in Santa Clara, CA vacancy
  •  ...developing deep learning frameworks for AMD GPUs. Your...  ...training and SOTA LLM and Multimodal inference at scale across...  ...across internal GPU software teams and engage with...  ...Skilled engineer with strong technical...  ...Triton, TileLang or DSL development within Linux environments... 
    Senior

    Advanced Micro Devices , Inc.

    Santa Clara, CA
    3 days ago
  • $152k - $241.5k

    NVIDIA Gruppe is seeking a Senior Software Engineer - AI Inference in Santa Clara, California. This role involves enhancing open-source LLM serving optimizations and implementing high-performance runtime capabilities. Candidates should have 5+ years of experience in building... 
    Senior

    NVIDIA Gruppe

    Santa Clara, CA
    5 days ago
  • NVIDIA Gruppe is looking for a skilled engineer to join their TensorRT Edge-LLM team in Santa Clara, California. The role involves developing a state-of-the-art inference framework for large language models and optimizing it for real-time performance on embedded platforms... 
    Senior

    NVIDIA Gruppe

    Santa Clara, CA
    5 days ago
  • $152k - $241.5k

     ...large language model inference? Join NVIDIA's TensorRT Edge-LLM team and help...  ...robotics. We build the software stack that enables...  ...the-art inference framework in modern C++ that...  ...and operator development for critical transformer...  .../Computer Engineering, or a closely related... 
    Senior
    Remote work

    NVIDIA

    Santa Clara, CA
    2 days ago
  • $165k - $242k

     ...Senior Software Engineer II, Inference Sunnyvale, CA / Bellevue, WA CoreWeave is The Essential Cloud for AI™. Built for pioneers by...  ...Preferred: Contributions to inference frameworks (vLLM, Triton, TensorRT-LLM, Ray Serve, TorchServe). Experience with CUDA... 
    Senior
    Permanent employment
    Temporary work
    Casual work
    Work at office
    Remote work
    Flexible hours
    Shift work

    CoreWeave

    Sunnyvale, CA
    4 days ago
  • $139k - $204k

     ...Senior Software Engineer I, Inference Sunnyvale, CA / Bellevue, WA CoreWeave is The Essential Cloud for AI™. Built for pioneers by...  ...Preferred: Contributions to inference frameworks (vLLM, Triton, TensorRT-LLM, Ray Serve, TorchServe). Experience with CUDA... 
    Senior
    Permanent employment
    Temporary work
    Casual work
    Work at office
    Remote work
    Flexible hours
    Shift work

    CoreWeave

    Sunnyvale, CA
    4 days ago
  • $152k - $241.5k

     ...talented and motivated engineers to join our...  ...leading deep learning inference software for NVIDIA AI...  .... As a Senior Software Engineer...  ...TensorRT and TensorRT-LLM to supercharge inference...  ...of software development experience on a...  ...developing Deep Learning Frameworks, Compilers, or... 
    Senior

    NVIDIA

    Santa Clara, CA
    4 days ago
  • $184k - $287.5k

     ...a skilled Agentic AI Software Engineer to join our team. The...  ...into leading agentic AI frameworks and open-source...  ...Day-0 NVIDIA model and inference support in agent orchestration...  ...in software design, development, and testing ~...  ...building with LLM-based agent frameworks... 
    Senior

    NVIDIA

    Santa Clara, CA
    1 day ago
  • $152k - $241.5k

     ...We are seeking a Senior Software Engineer to drive integration of...  ...leading open-source AI frameworks. In this role, you...  ...such as Dynamo, llm-d, Ray, PyTorch, and...  ...across training and inference stacks. Partner with...  ...AI workloads: model development basics, training vs.... 
    Senior
    Remote work

    NVIDIA

    Santa Clara, CA
    8 days ago
  • $184k - $287.5k

     ...motivated Deep Learning engineer to bring advanced...  ...PyTorch, TRT-LLM, vLLM, SGLang,...  ...to 100K GPUs to inference down at microsecond...  ...to the development of innovative technologies...  ...abstractions in AI frameworks: from PoC to...  ...principles (aka systems software fundamentals) ~... 
    Senior
    Remote work

    NVIDIA

    Santa Clara, CA
    5 days ago
  • $184k - $287.5k

     ...skilled and motivated software engineers to join us and build AI inference systems that serve large...  ...optimize the inference framework (vLLM) with methods like...  ...and optimizing LLM inference engines (e.g....  ...advance AI research and development to create groundbreaking... 
    Senior

    NVIDIA

    Santa Clara, CA
    5 days ago
  • We are now looking for a Senior Deep Learning Software Engineer, LLM Performance! NVIDIA is...  ...the performance of LLM inference. This role focuses on designing...  ...in NVIDIA/OSS LLM frameworks. Scale performance of LLM...  ...years of relevant software development experience. Excellent... 
    Senior

    NVIDIA Gruppe

    Santa Clara, CA
    5 days ago
  •  ...challenges in the field. RL requires inference, rollout generation, and...  .... NVIDIA is building an RL Frameworks engineering team to develop the open‑...  ...on. The team spans the full software stack, from collaborating...  ...Reinforcement learning for LLM post‑training (RLHF, PPO, GRPO... 
    Senior

    NVIDIA Gruppe

    Santa Clara, CA
    5 days ago
  • $184k - $287.5k

    Senior Deep Learning Software Engineer, Inference page is loaded## Senior Deep Learning Software Engineer...  ...deep learning frameworks, including SGLang and vLLM...  ...improvements for state-of-the-art LLM and Generative AI models...  ...of relevant software development experience.* Excellent C... 
    Senior

    NVIDIA Corporation

    Santa Clara, CA
    2 days ago
  • $193.3k - $261.5k

     ...AWS Neuron, the software development kit used to accelerate...  ...and application framework that seamlessly...  ...unparalleled ML inference and training performance...  ...boundary, our engineers build systematic...  ...wide variety of LLM model families,...  ...mentorship. Our senior members enjoy one... 
    Senior
    Work experience placement
    Internship
    Local area
    Flexible hours

    Amazon

    Cupertino, CA
    1 day ago
  • $184k - $356.5k

    NVIDIA Gruppe is seeking a Senior Engineer to lead the evolution of the core NIM Platform SDK and microservice framework in Santa Clara, California. This hands-on role involves...  ...architectures, contributing to production-grade software supporting AI applications. The position... 
    Senior

    NVIDIA Gruppe

    Santa Clara, CA
    5 days ago
  • $184k - $287.5k

     ...Senior Engineer, NIM Platform SDK and Microservice Framework NVIDIA is the platform for every...  ...-ready AI inference at scale. This is...  ...involves solving deep software engineering challenges...  ...API framework development. The role...  ...SGLang, TensorRT-LLM, Dynamo), middleware... 
    Senior

    NVIDIA

    Santa Clara, CA
    5 days ago
  • $199.7k - $254.6k

     ...AI Incubation Team as a Senior AI/MLDevOpsEngineer and help productionize LLM/SLM capabilities for...  ...collaborate with product and engineering teams to deploy...  ...SLMs, including on-prem inference packaging, runtime optimization...  ...role requires strong software engineering, hands-on... 
    Senior
    Full time
    Temporary work
    Local area
    Flexible hours

    Cisco

    San Jose, CA
    3 days ago
  • $152k - $241.5k

     ...We are looking for a Senior System Software Engineer to work on Dynamo-Triton Inference Server ( . NVIDIA is hiring...  ...Contribute to feature development and drive broad...  ...Large Language Model (LLM) and non-LLM workloads...  ...Prior experience with AI frameworks and engines, such as TensorRT... 
    Senior

    NVIDIA

    Santa Clara, CA
    5 days ago
  • $152k - $241.5k

     ...NVIDIA is the platform upon which every new AI‑powered application is built. We are seeking a Senior Software Engineer – AI Inference to advance open‑source LLM serving by contributing directly to upstream inference engines like vLLM and SGLang-ensuring they run best‑... 
    Senior
    Remote work

    NVIDIA

    Santa Clara, CA
    5 days ago
  • $152k - $241.5k

     ...We are now looking for a Senior Software Engineer for Deep Learning Inference! Would you like to make a big impact in...  ...build a state-of-the-art inference framework for accelerating Deep Learning...  ...field. ~3+ years of software development experience. ~ Strong experience... 
    Senior

    NVIDIA

    Santa Clara, CA
    4 days ago
  • $196.5k - $219.3k

     ...monitor and filter LLM requests/responses...  ...features. Mentor junior engineers on secure backend development and best practices...  ...of high-quality software features while adhering...  ...implementing model inference pipelines, fine-...  ...# Familiarity with Frameworks like PyTorch or... 
    Full time
    Worldwide

    Edelman

    Sunnyvale, CA
    3 days ago
  •  ...for an influential software engineer who is passionate...  ...: As a Senior Staff Software Developer...  ...accelerate the development and performance...  .../CUDA core of ML frameworks like PyTorch, TensorFlow...  ...the forefront of LLM advancements,...  ...) architectures, inference optimizations (e.... 

    Advanced Micro Devices , Inc.

    Santa Clara, CA
    2 days ago
  • $152k - $241.5k

     ...talented individual to optimize and benchmark GenAI inference using the latest acceleration technologies....  ...include a relevant degree and significant software development experience in Python or C++. A deep understanding of LLM architectures is necessary. The base salary... 
    Senior

    NVIDIA Gruppe

    Santa Clara, CA
    5 days ago
  • $165k - $242k

     ...Learn more at What You'll Do: Senior engineers are area owners who lead designs,...  ...teams to evolve our Kubernetes-native inference platform and meet strict P99 SLAs...  ...: Contributions to inference frameworks (vLLM, Triton, TensorRT-LLM, Ray Serve, TorchServe). Experience... 
    Senior
    Permanent employment
    Temporary work
    Casual work
    Work at office
    Flexible hours
    Shift work

    CoreWeave

    Sunnyvale, CA
    14 days ago
  • $152k - $241.5k

     ...NVIDIA's TensorRT team as a Senior Software Engineer, and be at the forefront...  ...enabling high-performance AI inference solutions for automotive...  ...: Lead the design and development of high-performance deep learning...  ...learning concepts and frameworks Experience with safety-... 
    Senior

    NVIDIA

    Santa Clara, CA
    2 days ago
  • $184k - $356.5k

    NVIDIA Corporation is seeking a Senior Deep Learning Software Engineer specializing in Inference to join their growing team in...  ...-performance deep learning frameworks like SGLang and vLLM. Candidates...  ...years of experience in software development. The position offers a base... 
    Senior

    NVIDIA Corporation

    Santa Clara, CA
    2 days ago
  • $184k - $287.5k

    NVIDIA is looking for a Senior Deep Learning Software Engineer in Santa Clara, California...  ...analyzing and improving LLM inference performance using NVIDIA...  ...have extensive software development experience, strong...  ...knowledge of deep learning frameworks. The position offers competitive... 
    Senior

    NVIDIA

    Santa Clara, CA
    5 days ago
  • $184k - $356.5k

    NVIDIA Gruppe is looking for skilled software engineers to develop AI inference systems that operate with high efficiency. The role involves architecting high-performance inference frameworks and optimizing GPU processes. Ideal candidates should have extensive programming... 
    Senior

    NVIDIA Gruppe

    Santa Clara, CA
    5 days ago
  •  ...is looking for a strategic software engineering lead who is passionate about...  ...optimizing scale-up and scale-out inference. Develop methods and...  ...sglang, or vllm and with kserve, llm-d. Experience running...  ...in-lieu of experience with frameworks such as kserve or llm-d.... 

    Advanced Micro Devices , Inc.

    Santa Clara, CA
    2 days ago

Do you want to receive more vacancies?

Subscribe and receive similar vacancies to Senior Software Development Engineer - LLM Inference Framework. Be the first to apply!