Senior Software Development Engineer - LLM Inference Framework

Advanced Micro Devices , Inc.

WHAT YOU DO AT AMD CHANGES EVERYTHING

At AMD, our mission is to build great products that accelerate next-generation computing experiences-from AI and data centers, to PCs, gaming and embedded systems. Grounded in a culture of innovation and collaboration, we believe real progress comes from bold ideas, human ingenuity and a shared passion to create something extraordinary. When you join AMD, you'll discover the real differentiator is our culture. We push the limits of innovation to solve the world's most important challenges-striving for execution excellence, while being direct, humble, collaborative, and inclusive of diverse perspectives. Join us as we shape the future of AI and beyond. Together, we advance your career.

THE ROLE:

As a senior member of the LLM inference framework team, you will be responsible for building and optimizing production-grade single-node and distributed inference runtimes for large language models on AMD GPUs. You will work at the framework and runtime layer, driving performance, scalability, and reliability, enabling tensor parallelism, pipeline parallelism, expert parallelism (MoE), and single-node or multi-node inference at scale. Your work will directly power customer-facing deployments and benchmarking platforms (e.g., InferenceMax, MLPerf, strategic partners, and cloud providers) and will be upstreamed into open-source inference frameworks such as vLLM and SGLang to make AMD a first-class platform for LLM serving.

This role sits at the intersection of inference engines, distributed systems, and GPU runtime and kernel backends.

THE PERSON :

You are a systems-minded ML engineer who thinks in terms of throughput, latency, memory movement, and scheduling, not just model code.

You are comfortable reading and modifying large-scale inference frameworks, debugging performance across GPUs and nodes, and collaborating with kernel, compiler, and networking teams to close end-to-end performance gaps.

You enjoy working in open source and driving architecture-level improvements in inference platforms.

KEY RESPONSIBILITIES:

Inference Framework & Runtime

Architect and optimize distributed LLM inference runtimes based on in-house LLM engines or open-source stacks such as vLLM, SGLang, and llm-d

Design and improve TP / PP / EP (MoE) hybrid execution, including KV-cache management, attention dispatch, and token scheduling

Implement and optimize multi-node inference pipelines using RCCL, RDMA, and collective-based execution

Performance & Scalability

Drive throughput, latency, and memory efficiency across single-GPU and multi-GPU clusters

Optimize continuous batching, speculative decoding, KV-cache paging, prefix caching, and multi-turn serving

GPU & Backend Integration

Work with AMD GPU libraries (AITER, HIPBLAS-LT, RCCL, ROCm runtime) to ensure inference frameworks efficiently use FP8 / FP4 GEMM and FlashAttention / MLA

Collaborate with compiler teams (Triton, LLVM, ROCm) to unblock framework-level performance

Open Source & Customer Enablement

Upstream features and performance fixes into vLLM, SGLang, and llm-d

Enable customer PoCs and production deployments on AMD platforms

Build and maintain benchmark-grade inference pipelines

PREFERRED EXPERIENCE:

Inference Stack Knowledge

Hands-on understanding of vLLM, SGLang, or similar inference stacks

Experience with distributed inference scaling and a proven track record of contributing to upstream open-source projects

Deep Learning Integration

Strong experience integrating optimized GPU performance into machine-learning frameworks (e.g., PyTorch, TensorFlow) for high-throughput and scalable inference

Kernel & Inference Frameworks

Strong background in NVIDIA, AMD, or similar GPU architectures and kernel development

Software Engineering

Expertise in Python and preferably experience in C/C++, including debugging, performance tuning, and test design for large-scale systems

High-Performance Computing

Experience running large-scale workloads on heterogeneous GPU clusters, optimizing for efficiency and scalability

Compiler & Runtime Optimization

Understanding of compiler and runtime systems, including LLVM, ROCm, and GPU code generation

ACADEMIC CREDENTIALS:

Master's or PhD in Computer Science, Computer Engineering, Electrical Engineering, or a related field.

#LI-JG1

Benefits offered are described: AMD benefits at a glance.

AMD does not accept unsolicited resumes from headhunters, recruitment agencies, or fee-based recruitment services. AMD and its subsidiaries are equal opportunity, inclusive employers and will consider all applicants without regard to age, ancestry, color, marital status, medical condition, mental or physical disability, national origin, race, religion, political and/or third-party affiliation, sex, pregnancy, sexual orientation, gender identity, military or veteran status, or any other characteristic protected by law. We encourage applications from all qualified candidates and will accommodate applicants' needs under the respective laws throughout all stages of the recruitment and selection process.

AMD may use Artificial Intelligence to help screen, assess or select applicants for this position. AMD's "Responsible AI Policy" is available here.

This posting is for an existing vacancy.

Apply

Vacancy posted 3 days ago

Similar jobs that could be interesting for youBased on the Senior Software Development Engineer - LLM Inference Framework in Santa Clara, CA vacancy

Senior Software Development Engineer - SGLang and Inference Stack
...developing deep learning frameworks for AMD GPUs. Your... ...training and SOTA LLM and Multimodal inference at scale across... ...across internal GPU software teams and engage with... ...Skilled engineer with strong technical... ...Triton, TileLang or DSL development within Linux environments...
Senior
Advanced Micro Devices , Inc.
Santa Clara, CA
3 days ago
Senior AI Inference Engineer - High-Performance LLM Serving
$152k - $241.5k
NVIDIA Gruppe is seeking a Senior Software Engineer - AI Inference in Santa Clara, California. This role involves enhancing open-source LLM serving optimizations and implementing high-performance runtime capabilities. Candidates should have 5+ years of experience in building...
Senior
NVIDIA Gruppe
Santa Clara, CA
5 days ago
Senior Edge-LLM Real-Time Inference Engineer
NVIDIA Gruppe is looking for a skilled engineer to join their TensorRT Edge-LLM team in Santa Clara, California. The role involves developing a state-of-the-art inference framework for large language models and optimizing it for real-time performance on embedded platforms...
Senior
NVIDIA Gruppe
Santa Clara, CA
5 days ago
Senior Software Engineer - TensorRT Edge-LLM
$152k - $241.5k
...large language model inference? Join NVIDIA's TensorRT Edge-LLM team and help... ...robotics. We build the software stack that enables... ...the-art inference framework in modern C++ that... ...and operator development for critical transformer... .../Computer Engineering, or a closely related...
Senior
Remote work
NVIDIA
Santa Clara, CA
2 days ago
Senior Software Engineer II, Inference
$165k - $242k
...Senior Software Engineer II, Inference Sunnyvale, CA / Bellevue, WA CoreWeave is The Essential Cloud for AI™. Built for pioneers by... ...Preferred: Contributions to inference frameworks (vLLM, Triton, TensorRT-LLM, Ray Serve, TorchServe). Experience with CUDA...
Senior
Permanent employment
Temporary work
Casual work
Work at office
Remote work
Flexible hours
Shift work
CoreWeave
Sunnyvale, CA
4 days ago
Senior Software Engineer I, Inference
$139k - $204k
...Senior Software Engineer I, Inference Sunnyvale, CA / Bellevue, WA CoreWeave is The Essential Cloud for AI™. Built for pioneers by... ...Preferred: Contributions to inference frameworks (vLLM, Triton, TensorRT-LLM, Ray Serve, TorchServe). Experience with CUDA...
Senior
Permanent employment
Temporary work
Casual work
Work at office
Remote work
Flexible hours
Shift work
CoreWeave
Sunnyvale, CA
4 days ago
Senior Software Engineer, Machine Learning Inference
$152k - $241.5k
...talented and motivated engineers to join our... ...leading deep learning inference software for NVIDIA AI... .... As a Senior Software Engineer... ...TensorRT and TensorRT-LLM to supercharge inference... ...of software development experience on a... ...developing Deep Learning Frameworks, Compilers, or...
Senior
NVIDIA
Santa Clara, CA
4 days ago
Senior Software Engineer, AI Apps and Frameworks
$184k - $287.5k
...a skilled Agentic AI Software Engineer to join our team. The... ...into leading agentic AI frameworks and open-source... ...Day-0 NVIDIA model and inference support in agent orchestration... ...in software design, development, and testing ~... ...building with LLM-based agent frameworks...
Senior
NVIDIA
Santa Clara, CA
1 day ago
Senior Software Engineer, AI Frameworks
$152k - $241.5k
...We are seeking a Senior Software Engineer to drive integration of... ...leading open-source AI frameworks. In this role, you... ...such as Dynamo, llm-d, Ray, PyTorch, and... ...across training and inference stacks. Partner with... ...AI workloads: model development basics, training vs....
Senior
Remote work
NVIDIA
Santa Clara, CA
8 days ago
Senior Deep Learning Frameworks CUDA Software Engineer
$184k - $287.5k
...motivated Deep Learning engineer to bring advanced... ...PyTorch, TRT-LLM, vLLM, SGLang,... ...to 100K GPUs to inference down at microsecond... ...to the development of innovative technologies... ...abstractions in AI frameworks: from PoC to... ...principles (aka systems software fundamentals) ~...
Senior
Remote work
NVIDIA
Santa Clara, CA
5 days ago
Senior Software Engineer, AI Inference Systems
$184k - $287.5k
...skilled and motivated software engineers to join us and build AI inference systems that serve large... ...optimize the inference framework (vLLM) with methods like... ...and optimizing LLM inference engines (e.g.... ...advance AI research and development to create groundbreaking...
Senior
NVIDIA
Santa Clara, CA
5 days ago
Senior Deep Learning Software Engineer, LLM Performance
We are now looking for a Senior Deep Learning Software Engineer, LLM Performance! NVIDIA is... ...the performance of LLM inference. This role focuses on designing... ...in NVIDIA/OSS LLM frameworks. Scale performance of LLM... ...years of relevant software development experience. Excellent...
Senior
NVIDIA Gruppe
Santa Clara, CA
5 days ago
Senior Software Engineer, RL Post-Training Frameworks
...challenges in the field. RL requires inference, rollout generation, and... .... NVIDIA is building an RL Frameworks engineering team to develop the open‑... ...on. The team spans the full software stack, from collaborating... ...Reinforcement learning for LLM post‑training (RLHF, PPO, GRPO...
Senior
NVIDIA Gruppe
Santa Clara, CA
5 days ago
Senior Deep Learning Software Engineer, Inference
$184k - $287.5k
Senior Deep Learning Software Engineer, Inference page is loaded## Senior Deep Learning Software Engineer... ...deep learning frameworks, including SGLang and vLLM... ...improvements for state-of-the-art LLM and Generative AI models... ...of relevant software development experience.* Excellent C...
Senior
NVIDIA Corporation
Santa Clara, CA
2 days ago
Senior Software Development Engineer, AI/ML, AWS Neuron, Model Inference
$193.3k - $261.5k
...AWS Neuron, the software development kit used to accelerate... ...and application framework that seamlessly... ...unparalleled ML inference and training performance... ...boundary, our engineers build systematic... ...wide variety of LLM model families,... ...mentorship. Our senior members enjoy one...
Senior
Work experience placement
Internship
Local area
Flexible hours
Amazon
Cupertino, CA
1 day ago
Senior Platform Engineer, NIM SDK & Framework AI Inference
$184k - $356.5k
NVIDIA Gruppe is seeking a Senior Engineer to lead the evolution of the core NIM Platform SDK and microservice framework in Santa Clara, California. This hands-on role involves... ...architectures, contributing to production-grade software supporting AI applications. The position...
Senior
NVIDIA Gruppe
Santa Clara, CA
5 days ago
Senior Software Engineer - NIM Platform SDK and Framework
$184k - $287.5k
...Senior Engineer, NIM Platform SDK and Microservice Framework NVIDIA is the platform for every... ...-ready AI inference at scale. This is... ...involves solving deep software engineering challenges... ...API framework development. The role... ...SGLang, TensorRT-LLM, Dynamo), middleware...
Senior
NVIDIA
Santa Clara, CA
5 days ago
Senior AI/ML Platform Engineer (LLM/SLM Inference)
$199.7k - $254.6k
...AI Incubation Team as a Senior AI/MLDevOpsEngineer and help productionize LLM/SLM capabilities for... ...collaborate with product and engineering teams to deploy... ...SLMs, including on-prem inference packaging, runtime optimization... ...role requires strong software engineering, hands-on...
Senior
Full time
Temporary work
Local area
Flexible hours
Cisco
San Jose, CA
3 days ago
Senior System Software Engineer - Dynamo-Triton Inference Server
$152k - $241.5k
...We are looking for a Senior System Software Engineer to work on Dynamo-Triton Inference Server ( . NVIDIA is hiring... ...Contribute to feature development and drive broad... ...Large Language Model (LLM) and non-LLM workloads... ...Prior experience with AI frameworks and engines, such as TensorRT...
Senior
NVIDIA
Santa Clara, CA
5 days ago
Senior Software Engineer - AI Inference
$152k - $241.5k
...NVIDIA is the platform upon which every new AI‑powered application is built. We are seeking a Senior Software Engineer – AI Inference to advance open‑source LLM serving by contributing directly to upstream inference engines like vLLM and SGLang-ensuring they run best‑...
Senior
Remote work
NVIDIA
Santa Clara, CA
5 days ago
Senior Software Engineer, Deep Learning Inference - TensorRT
$152k - $241.5k
...We are now looking for a Senior Software Engineer for Deep Learning Inference! Would you like to make a big impact in... ...build a state-of-the-art inference framework for accelerating Deep Learning... ...field. ~3+ years of software development experience. ~ Strong experience...
Senior
NVIDIA
Santa Clara, CA
4 days ago
Staff Software Development Engineer (LLM)
$196.5k - $219.3k
...monitor and filter LLM requests/responses... ...features. Mentor junior engineers on secure backend development and best practices... ...of high-quality software features while adhering... ...implementing model inference pipelines, fine-... ...# Familiarity with Frameworks like PyTorch or...
Full time
Worldwide
Edelman
Sunnyvale, CA
3 days ago
Staff Software Development Engineer- GPU, LLM, AI
...for an influential software engineer who is passionate... ...: As a Senior Staff Software Developer... ...accelerate the development and performance... .../CUDA core of ML frameworks like PyTorch, TensorFlow... ...the forefront of LLM advancements,... ...) architectures, inference optimizations (e....
Advanced Micro Devices , Inc.
Santa Clara, CA
2 days ago
Senior AI Inference Performance Engineer (GPU/Cluster)
$152k - $241.5k
...talented individual to optimize and benchmark GenAI inference using the latest acceleration technologies.... ...include a relevant degree and significant software development experience in Python or C++. A deep understanding of LLM architectures is necessary. The base salary...
Senior
NVIDIA Gruppe
Santa Clara, CA
5 days ago
Senior Software Engineer II, Inference
$165k - $242k
...Learn more at What You'll Do: Senior engineers are area owners who lead designs,... ...teams to evolve our Kubernetes-native inference platform and meet strict P99 SLAs... ...: Contributions to inference frameworks (vLLM, Triton, TensorRT-LLM, Ray Serve, TorchServe). Experience...
Senior
Permanent employment
Temporary work
Casual work
Work at office
Flexible hours
Shift work
CoreWeave
Sunnyvale, CA
14 days ago
Senior Software Engineer, Deep Learning Inference - Automotive Safety
$152k - $241.5k
...NVIDIA's TensorRT team as a Senior Software Engineer, and be at the forefront... ...enabling high-performance AI inference solutions for automotive... ...: Lead the design and development of high-performance deep learning... ...learning concepts and frameworks Experience with safety-...
Senior
NVIDIA
Santa Clara, CA
2 days ago
Senior AI Inference Engineer — GPU DL, Equity Eligible
$184k - $356.5k
NVIDIA Corporation is seeking a Senior Deep Learning Software Engineer specializing in Inference to join their growing team in... ...-performance deep learning frameworks like SGLang and vLLM. Candidates... ...years of experience in software development. The position offers a base...
Senior
NVIDIA Corporation
Santa Clara, CA
2 days ago
Senior Deep Learning Engineer - LLM Performance & Optimization
$184k - $287.5k
NVIDIA is looking for a Senior Deep Learning Software Engineer in Santa Clara, California... ...analyzing and improving LLM inference performance using NVIDIA... ...have extensive software development experience, strong... ...knowledge of deep learning frameworks. The position offers competitive...
Senior
NVIDIA
Santa Clara, CA
5 days ago
Senior AI Inference Systems Engineer: GPU-Optimized, Cloud
$184k - $356.5k
NVIDIA Gruppe is looking for skilled software engineers to develop AI inference systems that operate with high efficiency. The role involves architecting high-performance inference frameworks and optimizing GPU processes. Ideal candidates should have extensive programming...
Senior
NVIDIA Gruppe
Santa Clara, CA
5 days ago
Principal Software Engineer (AI Inference / Distributed Systems)
...is looking for a strategic software engineering lead who is passionate about... ...optimizing scale-up and scale-out inference. Develop methods and... ...sglang, or vllm and with kserve, llm-d. Experience running... ...in-lieu of experience with frameworks such as kserve or llm-d....
Advanced Micro Devices , Inc.
Santa Clara, CA
2 days ago

Do you want to receive more vacancies?

Subscribe and receive similar vacancies to Senior Software Development Engineer - LLM Inference Framework. Be the first to apply!