Senior Software Development Engineer - LLM Inference Framework

Advanced Micro Devices , Inc.

What You Do At AMD Changes Everything

At AMD, our mission is to build great products that accelerate next-generation computing experiences—from AI and data centers, to PCs, gaming and embedded systems. Grounded in a culture of innovation and collaboration, we believe real progress comes from bold ideas, human ingenuity and a shared passion to create something extraordinary. When you join AMD, you'll discover the real differentiator is our culture. We push the limits of innovation to solve the world's most important challenges—striving for execution excellence, while being direct, humble, collaborative, and inclusive of diverse perspectives. Join us as we shape the future of AI and beyond.

Together, we advance your career.

The Role

As a senior member of the LLM inference framework team, you will be responsible for building and optimizing production-grade single-node and distributed inference runtimes for large language models on AMD GPUs. You will work at the framework and runtime layer, driving performance, scalability, and reliability, enabling tensor parallelism, pipeline parallelism, expert parallelism (MoE), and single-node or multi-node inference at scale. Your work will directly power customer-facing deployments and benchmarking platforms (e.g., InferenceMax, MLPerf, strategic partners, and cloud providers) and will be upstreamed into open-source inference frameworks such as vLLM and SGLang to make AMD a first-class platform for LLM serving.

This role sits at the intersection of inference engines, distributed systems, and GPU runtime and kernel backends.

The Person

You are a systems-minded ML engineer who thinks in terms of throughput, latency, memory movement, and scheduling, not just model code. You are comfortable reading and modifying large-scale inference frameworks, debugging performance across GPUs and nodes, and collaborating with kernel, compiler, and networking teams to close end-to-end performance gaps. You enjoy working in open source and driving architecture-level improvements in inference platforms.

Key Responsibilities

Inference Framework & Runtime

Architect and optimize distributed LLM inference runtimes based on in-house LLM engines or open-source stacks such as vLLM, SGLang, and llm-d
Design and improve TP / PP / EP (MoE) hybrid execution, including KV-cache management, attention dispatch, and token scheduling
Implement and optimize multi-node inference pipelines using RCCL, RDMA, and collective-based execution

Performance & Scalability

Drive throughput, latency, and memory efficiency across single-GPU and multi-GPU clusters
Optimize continuous batching, speculative decoding, KV-cache paging, prefix caching, and multi-turn serving

GPU & Backend Integration

Work with AMD GPU libraries (AITER, HIPBLAS-LT, RCCL, ROCm runtime) to ensure inference frameworks efficiently use FP8 / FP4 GEMM and FlashAttention / MLA
Collaborate with compiler teams (Triton, LLVM, ROCm) to unblock framework-level performance

Open Source & Customer Enablement

Upstream features and performance fixes into vLLM, SGLang, and llm-d
Enable customer PoCs and production deployments on AMD platforms
Build and maintain benchmark-grade inference pipelines

Preferred Experience

Inference Stack Knowledge

Hands-on understanding of vLLM, SGLang, or similar inference stacks
Experience with distributed inference scaling and a proven track record of contributing to upstream open-source projects

Deep Learning Integration

Strong experience integrating optimized GPU performance into machine-learning frameworks (e.g., PyTorch, TensorFlow) for high-throughput and scalable inference

Kernel & Inference Frameworks

Strong background in NVIDIA, AMD, or similar GPU architectures and kernel development

Software Engineering

Expertise in Python and preferably experience in C/C++, including debugging, performance tuning, and test design for large-scale systems

High-Performance Computing

Experience running large-scale workloads on heterogeneous GPU clusters, optimizing for efficiency and scalability

Compiler & Runtime Optimization

Understanding of compiler and runtime systems, including LLVM, ROCm, and GPU code generation

Academic Credentials

Master's or PhD in Computer Science, Computer Engineering, Electrical Engineering, or a related field.

Apply

Vacancy posted 13 hours ago

Similar jobs that could be interesting for youBased on the Senior Software Development Engineer - LLM Inference Framework in Santa Clara, CA vacancy

Senior Software Development Engineer - SGLang and Inference Stack
...developing deep learning frameworks for AMD GPUs. Your... ...training and SOTA LLM and Multimodal inference at scale across... ...across internal GPU software teams and engage with... ...Skilled engineer with strong technical... ...Triton, TileLang or DSL development within Linux environments...
Senior
Advanced Micro Devices , Inc.
Santa Clara, CA
1 day ago
Senior LLM Performance Engineer - GPU Inference
$184k - $356.5k
...California is seeking a Senior Deep Learning Software Engineer focused on performance optimization of LLM models. You will... ...analyze and enhance LLM inference performance, working... ...extensive software development experience with deep learning frameworks such as PyTorch and...
Senior
Full time
NVIDIA Corporation
Santa Clara, CA
1 day ago
Senior Software Engineer - TensorRT Edge-LLM
$152k - $241.5k
...large language model inference? Join NVIDIA's TensorRT Edge-LLM team and help... ...robotics. We build the software stack that enables... ...the-art inference framework in modern C++ that... ...and operator development for critical transformer... .../Computer Engineering, or a closely related...
Senior
Remote work
NVIDIA
Santa Clara, CA
5 days ago
Senior Software Engineer II, Inference
$165k - $242k
...Senior Software Engineer II, Inference Sunnyvale, CA / Bellevue, WA CoreWeave is The Essential Cloud for AI™. Built for pioneers by... ...Preferred: Contributions to inference frameworks (vLLM, Triton, TensorRT-LLM, Ray Serve, TorchServe). Experience with CUDA...
Senior
Permanent employment
Temporary work
Casual work
Work at office
Remote work
Flexible hours
Shift work
CoreWeave
Sunnyvale, CA
2 days ago
Senior Software Engineer I, Inference
$139k - $204k
...Senior Software Engineer I, Inference Sunnyvale, CA / Bellevue, WA CoreWeave is The Essential Cloud for AI™. Built for pioneers by... ...Preferred: Contributions to inference frameworks (vLLM, Triton, TensorRT-LLM, Ray Serve, TorchServe). Experience with CUDA...
Senior
Permanent employment
Temporary work
Casual work
Work at office
Remote work
Flexible hours
Shift work
CoreWeave
Sunnyvale, CA
2 days ago
Senior Software Engineer, Quantized Inference
$152k - $241.5k
Senior Software Engineer, Quantized Inference page is loaded## Senior Software Engineer, Quantized... ...engines (vLLM, TRT-LLM, SGLang). The candidate... ..., export) or equivalent framework* Experience reading, modifying... ...) or Triton kernel development* Track record of debugging...
Senior
NVIDIA Corporation
Santa Clara, CA
4 days ago
Senior Software Engineer, Machine Learning Inference
$152k - $241.5k
...talented and motivated engineers to join our... ...leading deep learning inference software for NVIDIA AI... .... As a Senior Software Engineer... ...TensorRT and TensorRT-LLM to supercharge inference... ...of software development experience on a... ...developing Deep Learning Frameworks, Compilers, or...
Senior
NVIDIA
Santa Clara, CA
2 days ago
Senior Software Engineer, AI Apps and Frameworks
$184k - $287.5k
...a skilled Agentic AI Software Engineer to join our team. The... ...into leading agentic AI frameworks and open-source... ...Day-0 NVIDIA model and inference support in agent orchestration... ...in software design, development, and testing ~... ...building with LLM-based agent frameworks...
Senior
NVIDIA
Santa Clara, CA
4 days ago
Senior Software Engineer, AI Frameworks
$152k - $241.5k
...We are seeking a Senior Software Engineer to drive integration of... ...leading open-source AI frameworks. In this role, you... ...such as Dynamo, llm-d, Ray, PyTorch, and... ...across training and inference stacks. Partner with... ...AI workloads: model development basics, training vs....
Senior
Remote work
NVIDIA
Santa Clara, CA
6 days ago
Senior Software Engineer, AI Inference Systems
$184k - $287.5k
...skilled and motivated software engineers to join us and build AI inference systems that serve large... ...optimize the inference framework (vLLM) with methods like... ...and optimizing LLM inference engines (e.g.... ...advance AI research and development to create groundbreaking...
Senior
NVIDIA
Santa Clara, CA
3 days ago
Senior Deep Learning Software Engineer, LLM Performance
$184k - $287.5k
We are now looking for a Senior Deep Learning Software Engineer, LLM Performance! NVIDIA is... ...the performance of LLM inference. NVIDIA is rapidly growing our research and development for Deep Learning Inference... ...in NVIDIA/OSS LLM frameworks. Scale performance of LLM...
Senior
NVIDIA
Santa Clara, CA
3 days ago
Senior Software Engineer, RL Post-Training Frameworks
$184k - $287.5k
Senior Software Engineer, RL Post-Training Frameworks page is loaded## Senior Software Engineer, RL Post-Training Frameworkslocations... ...in the field. RL requires inference, rollout generation, and training... ...:*** Reinforcement learning for LLM post-training (RLHF, PPO, GRPO,...
Senior
NVIDIA Corporation
Santa Clara, CA
1 day ago
Senior Deep Learning Software Engineer, Inference
$184k - $287.5k
Senior Deep Learning Software Engineer, Inference page is loaded## Senior Deep Learning Software Engineer... ...deep learning frameworks, including SGLang and vLLM... ...improvements for state-of-the-art LLM and Generative AI models... ...of relevant software development experience.* Excellent C...
Senior
NVIDIA Corporation
Santa Clara, CA
5 days ago
Senior Software Development Engineer, AI/ML, AWS Neuron, Model Inference
$193.3k - $261.5k
...AWS Neuron, the software development kit used to accelerate... ...and application framework that seamlessly... ...unparalleled ML inference and training performance... ...boundary, our engineers build systematic... ...wide variety of LLM model families,... ...mentorship. Our senior members enjoy one...
Senior
Work experience placement
Internship
Local area
Flexible hours
Amazon
Cupertino, CA
4 days ago
Senior Software Engineer - NIM Platform SDK and Framework
$184k - $287.5k
...Senior Engineer, NIM Platform SDK and Microservice Framework NVIDIA is the platform for every... ...-ready AI inference at scale. This is... ...involves solving deep software engineering challenges... ...API framework development. The role... ...SGLang, TensorRT-LLM, Dynamo), middleware...
Senior
NVIDIA
Santa Clara, CA
3 days ago
Senior ML Inference Engineer - Platform
$128.7k - $261.3k
...Model Deployment & Inference Solutions team in... ...from training frameworks (e.g. PyTorch) onto... ...every ML model development team at GM.What you... ...as part of your engineering workflow. Experience... ..., well-tested software with clear interfaces... ...agentic or LLM-powered developer...
Senior
Flexible hours
General Motors
Sunnyvale, CA
1 day ago
Senior AI/ML Platform Engineer (LLM/SLM Inference)
$199.7k - $254.6k
...AI Incubation Team as a Senior AI/MLDevOpsEngineer and help productionize LLM/SLM capabilities for... ...collaborate with product and engineering teams to deploy... ...SLMs, including on-prem inference packaging, runtime optimization... ...role requires strong software engineering, hands-on...
Senior
Full time
Temporary work
Local area
Flexible hours
Cisco
San Jose, CA
1 day ago
Senior Software Engineer, Deep Learning Inference - TensorRT
$152k - $241.5k
...We are now looking for a Senior Software Engineer for Deep Learning Inference! Would you like to make a big impact in... ...build a state-of-the-art inference framework for accelerating Deep Learning... ...field. ~3+ years of software development experience. ~ Strong experience...
Senior
NVIDIA
Santa Clara, CA
2 days ago
Senior Software Engineer - AI Inference
$152k - $241.5k
...NVIDIA is the platform upon which every new AI‑powered application is built. We are seeking a Senior Software Engineer – AI Inference to advance open‑source LLM serving by contributing directly to upstream inference engines like vLLM and SGLang-ensuring they run best‑...
Senior
Remote work
NVIDIA
Santa Clara, CA
3 days ago
Senior System Software Engineer - Dynamo-Triton Inference Server
$152k - $241.5k
...We are looking for a Senior System Software Engineer to work on Dynamo-Triton Inference Server ( . NVIDIA is hiring... ...Contribute to feature development and drive broad... ...Large Language Model (LLM) and non-LLM workloads... ...Prior experience with AI frameworks and engines, such as TensorRT...
Senior
NVIDIA
Santa Clara, CA
3 days ago
Staff Software Development Engineer- GPU, LLM, AI
...for an influential software engineer who is passionate... ...: As a Senior Staff Software Developer... ...accelerate the development and performance... .../CUDA core of ML frameworks like PyTorch, TensorFlow... ...the forefront of LLM advancements,... ...) architectures, inference optimizations (e....
Advanced Micro Devices , Inc.
Santa Clara, CA
1 day ago
Staff Software Development Engineer (LLM)
$196.5k - $219.3k
...monitor and filter LLM requests/responses... ...features. Mentor junior engineers on secure backend development and best practices... ...of high-quality software features while adhering... ...implementing model inference pipelines, fine-... ...# Familiarity with Frameworks like PyTorch or...
Full time
Worldwide
Fortinet
Sunnyvale, CA
1 day ago
Senior AI Kernel & Inference Engineer
A leading technology company is seeking a Senior AI Software Engineer to join their team in Santa Clara, California. In... ...develop groundbreaking AI systems software for inference applications including deep learning framework optimizations and GPU kernel technologies. You...
Senior
NVIDIA Corporation
Santa Clara, CA
5 days ago
Senior Software Engineer, Deep Learning Inference - Automotive Safety
$152k - $241.5k
...NVIDIA's TensorRT team as a Senior Software Engineer, and be at the forefront... ...enabling high-performance AI inference solutions for automotive... ...: Lead the design and development of high-performance deep learning... ...learning concepts and frameworks Experience with safety-...
Senior
NVIDIA
Santa Clara, CA
5 days ago
Senior AI Inference Engineer — GPU DL, Equity Eligible
$184k - $356.5k
NVIDIA Corporation is seeking a Senior Deep Learning Software Engineer specializing in Inference to join their growing team in... ...-performance deep learning frameworks like SGLang and vLLM. Candidates... ...years of experience in software development. The position offers a base...
Senior
NVIDIA Corporation
Santa Clara, CA
5 days ago
Senior Deep Learning Engineer - LLM Performance & Optimization
$184k - $287.5k
NVIDIA is looking for a Senior Deep Learning Software Engineer in Santa Clara, California... ...analyzing and improving LLM inference performance using NVIDIA... ...have extensive software development experience, strong... ...knowledge of deep learning frameworks. The position offers competitive...
Senior
NVIDIA
Santa Clara, CA
3 days ago
Principal Software Engineer (AI Inference / Distributed Systems)
...is looking for a strategic software engineering lead who is passionate about... ...optimizing scale-up and scale-out inference. Develop methods and... ...sglang, or vllm and with kserve, llm-d. Experience running... ...in-lieu of experience with frameworks such as kserve or llm-d....
Advanced Micro Devices , Inc.
Santa Clara, CA
5 days ago
Principal Software Engineer - Large-Scale LLM Memory and Storage Systems
$272k - $431.25k
...high-throughput, low-latency inference framework for serving generative AI and... ...resilient deployment of cutting-edge LLM workloads. We are seeking a Principal Systems Engineer to define the vision and... ...accelerators and memory pools. Mentor senior and junior engineers, set...
Local area
Remote work
NVIDIA
Santa Clara, CA
2 days ago
Senior/Staff Software Engineer - Machine Learning Platform (Inference)
$236k - $339.25k
...machine learning and LLM workloads. Join us to... ...and proactively with senior architects, PMs, and... ...continuously improve ML development velocity and... ...learning services, and frameworks. Strong track record... ...in serving LLMs using inference engines like vLLM, TensorRT-LLM...
Senior
Flexible hours
Snowflake Computing
Menlo Park, CA
3 days ago
Senior DL Algorithms Engineer - Inference Performance
$152k - $241.5k
...are looking for a Senior DL Algorithms Engineer for LLM/Omni model optimizations... ...of the hardware/software stack from GPU... ...architecture to Deep Learning Framework to achieve peak... ...’s accelerated inference SW stack. Contribute... ...-heavy application development. Deep...
Senior
NVIDIA
Santa Clara, CA
3 days ago

Do you want to receive more vacancies?

Subscribe and receive similar vacancies to Senior Software Development Engineer - LLM Inference Framework. Be the first to apply!