Senior Software Development Engineer - LLM Inference Framework
Advanced Micro Devices , Inc.
What You Do At AMD Changes Everything
At AMD, our mission is to build great products that accelerate next-generation computing experiences—from AI and data centers, to PCs, gaming and embedded systems. Grounded in a culture of innovation and collaboration, we believe real progress comes from bold ideas, human ingenuity and a shared passion to create something extraordinary. When you join AMD, you'll discover the real differentiator is our culture. We push the limits of innovation to solve the world's most important challenges—striving for execution excellence, while being direct, humble, collaborative, and inclusive of diverse perspectives. Join us as we shape the future of AI and beyond.
Together, we advance your career.
The Role
As a senior member of the LLM inference framework team, you will be responsible for building and optimizing production-grade single-node and distributed inference runtimes for large language models on AMD GPUs. You will work at the framework and runtime layer, driving performance, scalability, and reliability, enabling tensor parallelism, pipeline parallelism, expert parallelism (MoE), and single-node or multi-node inference at scale. Your work will directly power customer-facing deployments and benchmarking platforms (e.g., InferenceMax, MLPerf, strategic partners, and cloud providers) and will be upstreamed into open-source inference frameworks such as vLLM and SGLang to make AMD a first-class platform for LLM serving.
This role sits at the intersection of inference engines, distributed systems, and GPU runtime and kernel backends.
The Person
You are a systems-minded ML engineer who thinks in terms of throughput, latency, memory movement, and scheduling, not just model code. You are comfortable reading and modifying large-scale inference frameworks, debugging performance across GPUs and nodes, and collaborating with kernel, compiler, and networking teams to close end-to-end performance gaps. You enjoy working in open source and driving architecture-level improvements in inference platforms.
Key Responsibilities
Inference Framework & Runtime
- Architect and optimize distributed LLM inference runtimes based on in-house LLM engines or open-source stacks such as vLLM, SGLang, and llm-d
- Design and improve TP / PP / EP (MoE) hybrid execution, including KV-cache management, attention dispatch, and token scheduling
- Implement and optimize multi-node inference pipelines using RCCL, RDMA, and collective-based execution
Performance & Scalability
- Drive throughput, latency, and memory efficiency across single-GPU and multi-GPU clusters
- Optimize continuous batching, speculative decoding, KV-cache paging, prefix caching, and multi-turn serving
GPU & Backend Integration
- Work with AMD GPU libraries (AITER, HIPBLAS-LT, RCCL, ROCm runtime) to ensure inference frameworks efficiently use FP8 / FP4 GEMM and FlashAttention / MLA
- Collaborate with compiler teams (Triton, LLVM, ROCm) to unblock framework-level performance
Open Source & Customer Enablement
- Upstream features and performance fixes into vLLM, SGLang, and llm-d
- Enable customer PoCs and production deployments on AMD platforms
- Build and maintain benchmark-grade inference pipelines
Preferred Experience
Inference Stack Knowledge
- Hands-on understanding of vLLM, SGLang, or similar inference stacks
- Experience with distributed inference scaling and a proven track record of contributing to upstream open-source projects
Deep Learning Integration
- Strong experience integrating optimized GPU performance into machine-learning frameworks (e.g., PyTorch, TensorFlow) for high-throughput and scalable inference
Kernel & Inference Frameworks
- Strong background in NVIDIA, AMD, or similar GPU architectures and kernel development
Software Engineering
- Expertise in Python and preferably experience in C/C++, including debugging, performance tuning, and test design for large-scale systems
High-Performance Computing
- Experience running large-scale workloads on heterogeneous GPU clusters, optimizing for efficiency and scalability
Compiler & Runtime Optimization
- Understanding of compiler and runtime systems, including LLVM, ROCm, and GPU code generation
Academic Credentials
- Master's or PhD in Computer Science, Computer Engineering, Electrical Engineering, or a related field.
- ...developing deep learning frameworks for AMD GPUs. Your... ...training and SOTA LLM and Multimodal inference at scale across... ...across internal GPU software teams and engage with... ...Skilled engineer with strong technical... ...Triton, TileLang or DSL development within Linux environments...Senior
$184k - $356.5k
...California is seeking a Senior Deep Learning Software Engineer focused on performance optimization of LLM models. You will... ...analyze and enhance LLM inference performance, working... ...extensive software development experience with deep learning frameworks such as PyTorch and...SeniorFull time$152k - $241.5k
...large language model inference? Join NVIDIA's TensorRT Edge-LLM team and help... ...robotics. We build the software stack that enables... ...the-art inference framework in modern C++ that... ...and operator development for critical transformer... .../Computer Engineering, or a closely related...SeniorRemote work$165k - $242k
...Senior Software Engineer II, Inference Sunnyvale, CA / Bellevue, WA CoreWeave is The Essential Cloud for AI™. Built for pioneers by... ...Preferred: Contributions to inference frameworks (vLLM, Triton, TensorRT-LLM, Ray Serve, TorchServe). Experience with CUDA...SeniorPermanent employmentTemporary workCasual workWork at officeRemote workFlexible hoursShift work$139k - $204k
...Senior Software Engineer I, Inference Sunnyvale, CA / Bellevue, WA CoreWeave is The Essential Cloud for AI™. Built for pioneers by... ...Preferred: Contributions to inference frameworks (vLLM, Triton, TensorRT-LLM, Ray Serve, TorchServe). Experience with CUDA...SeniorPermanent employmentTemporary workCasual workWork at officeRemote workFlexible hoursShift work$152k - $241.5k
Senior Software Engineer, Quantized Inference page is loaded## Senior Software Engineer, Quantized... ...engines (vLLM, TRT-LLM, SGLang). The candidate... ..., export) or equivalent framework* Experience reading, modifying... ...) or Triton kernel development* Track record of debugging...Senior$152k - $241.5k
...talented and motivated engineers to join our... ...leading deep learning inference software for NVIDIA AI... .... As a Senior Software Engineer... ...TensorRT and TensorRT-LLM to supercharge inference... ...of software development experience on a... ...developing Deep Learning Frameworks, Compilers, or...Senior$184k - $287.5k
...a skilled Agentic AI Software Engineer to join our team. The... ...into leading agentic AI frameworks and open-source... ...Day-0 NVIDIA model and inference support in agent orchestration... ...in software design, development, and testing ~... ...building with LLM-based agent frameworks...Senior$152k - $241.5k
...We are seeking a Senior Software Engineer to drive integration of... ...leading open-source AI frameworks. In this role, you... ...such as Dynamo, llm-d, Ray, PyTorch, and... ...across training and inference stacks. Partner with... ...AI workloads: model development basics, training vs....SeniorRemote work$184k - $287.5k
...skilled and motivated software engineers to join us and build AI inference systems that serve large... ...optimize the inference framework (vLLM) with methods like... ...and optimizing LLM inference engines (e.g.... ...advance AI research and development to create groundbreaking...Senior$184k - $287.5k
We are now looking for a Senior Deep Learning Software Engineer, LLM Performance! NVIDIA is... ...the performance of LLM inference. NVIDIA is rapidly growing our research and development for Deep Learning Inference... ...in NVIDIA/OSS LLM frameworks. Scale performance of LLM...Senior$184k - $287.5k
Senior Software Engineer, RL Post-Training Frameworks page is loaded## Senior Software Engineer, RL Post-Training Frameworkslocations... ...in the field. RL requires inference, rollout generation, and training... ...:*** Reinforcement learning for LLM post-training (RLHF, PPO, GRPO,...Senior$184k - $287.5k
Senior Deep Learning Software Engineer, Inference page is loaded## Senior Deep Learning Software Engineer... ...deep learning frameworks, including SGLang and vLLM... ...improvements for state-of-the-art LLM and Generative AI models... ...of relevant software development experience.* Excellent C...Senior$193.3k - $261.5k
...AWS Neuron, the software development kit used to accelerate... ...and application framework that seamlessly... ...unparalleled ML inference and training performance... ...boundary, our engineers build systematic... ...wide variety of LLM model families,... ...mentorship. Our senior members enjoy one...SeniorWork experience placementInternshipLocal areaFlexible hours$184k - $287.5k
...Senior Engineer, NIM Platform SDK and Microservice Framework NVIDIA is the platform for every... ...-ready AI inference at scale. This is... ...involves solving deep software engineering challenges... ...API framework development. The role... ...SGLang, TensorRT-LLM, Dynamo), middleware...Senior$128.7k - $261.3k
...Model Deployment & Inference Solutions team in... ...from training frameworks (e.g. PyTorch) onto... ...every ML model development team at GM.What you... ...as part of your engineering workflow. Experience... ..., well-tested software with clear interfaces... ...agentic or LLM-powered developer...SeniorFlexible hours$199.7k - $254.6k
...AI Incubation Team as a Senior AI/MLDevOpsEngineer and help productionize LLM/SLM capabilities for... ...collaborate with product and engineering teams to deploy... ...SLMs, including on-prem inference packaging, runtime optimization... ...role requires strong software engineering, hands-on...SeniorFull timeTemporary workLocal areaFlexible hours$152k - $241.5k
...We are now looking for a Senior Software Engineer for Deep Learning Inference! Would you like to make a big impact in... ...build a state-of-the-art inference framework for accelerating Deep Learning... ...field. ~3+ years of software development experience. ~ Strong experience...Senior$152k - $241.5k
...NVIDIA is the platform upon which every new AI‑powered application is built. We are seeking a Senior Software Engineer – AI Inference to advance open‑source LLM serving by contributing directly to upstream inference engines like vLLM and SGLang-ensuring they run best‑...SeniorRemote work$152k - $241.5k
...We are looking for a Senior System Software Engineer to work on Dynamo-Triton Inference Server ( . NVIDIA is hiring... ...Contribute to feature development and drive broad... ...Large Language Model (LLM) and non-LLM workloads... ...Prior experience with AI frameworks and engines, such as TensorRT...Senior- ...for an influential software engineer who is passionate... ...: As a Senior Staff Software Developer... ...accelerate the development and performance... .../CUDA core of ML frameworks like PyTorch, TensorFlow... ...the forefront of LLM advancements,... ...) architectures, inference optimizations (e....
$196.5k - $219.3k
...monitor and filter LLM requests/responses... ...features. Mentor junior engineers on secure backend development and best practices... ...of high-quality software features while adhering... ...implementing model inference pipelines, fine-... ...# Familiarity with Frameworks like PyTorch or...Full timeWorldwide- A leading technology company is seeking a Senior AI Software Engineer to join their team in Santa Clara, California. In... ...develop groundbreaking AI systems software for inference applications including deep learning framework optimizations and GPU kernel technologies. You...Senior
$152k - $241.5k
...NVIDIA's TensorRT team as a Senior Software Engineer, and be at the forefront... ...enabling high-performance AI inference solutions for automotive... ...: Lead the design and development of high-performance deep learning... ...learning concepts and frameworks Experience with safety-...Senior$184k - $356.5k
NVIDIA Corporation is seeking a Senior Deep Learning Software Engineer specializing in Inference to join their growing team in... ...-performance deep learning frameworks like SGLang and vLLM. Candidates... ...years of experience in software development. The position offers a base...Senior$184k - $287.5k
NVIDIA is looking for a Senior Deep Learning Software Engineer in Santa Clara, California... ...analyzing and improving LLM inference performance using NVIDIA... ...have extensive software development experience, strong... ...knowledge of deep learning frameworks. The position offers competitive...Senior- ...is looking for a strategic software engineering lead who is passionate about... ...optimizing scale-up and scale-out inference. Develop methods and... ...sglang, or vllm and with kserve, llm-d. Experience running... ...in-lieu of experience with frameworks such as kserve or llm-d....
$272k - $431.25k
...high-throughput, low-latency inference framework for serving generative AI and... ...resilient deployment of cutting-edge LLM workloads. We are seeking a Principal Systems Engineer to define the vision and... ...accelerators and memory pools. Mentor senior and junior engineers, set...Local areaRemote work$236k - $339.25k
...machine learning and LLM workloads. Join us to... ...and proactively with senior architects, PMs, and... ...continuously improve ML development velocity and... ...learning services, and frameworks. Strong track record... ...in serving LLMs using inference engines like vLLM, TensorRT-LLM...SeniorFlexible hours$152k - $241.5k
...are looking for a Senior DL Algorithms Engineer for LLM/Omni model optimizations... ...of the hardware/software stack from GPU... ...architecture to Deep Learning Framework to achieve peak... ...’s accelerated inference SW stack. Contribute... ...-heavy application development. Deep...Senior
Do you want to receive more vacancies?
Subscribe and receive similar vacancies to Senior Software Development Engineer - LLM Inference Framework. Be the first to apply!
- software engineer full time Santa Clara, CA
- startup software engineer Santa Clara, CA
- rust software engineer Santa Clara, CA
- work from home software developer Santa Clara, CA
- software developer Santa Clara, CA
- software development engineer aws Santa Clara, CA
- software qa engineer Santa Clara, CA
- ngo software engineer Santa Clara, CA
- software engineer staff Santa Clara, CA
- software engineer Santa Clara, CA

