Principal GenAI Inference Optimization Engineer

Advanced Micro Devices

WHAT YOU DO AT AMD CHANGES EVERYTHING

At AMD, our mission is to build great products that accelerate next-generation computing experiences—from AI and data centers, to PCs, gaming and embedded systems. Grounded in a culture of innovation and collaboration, we believe real progress comes from bold ideas, human ingenuity and a shared passion to create something extraordinary. When you join AMD, you’ll discover the real differentiator is our culture. We push the limits of innovation to solve the world’s most important challenges—striving for execution excellence, while being direct, humble, collaborative, and inclusive of diverse perspectives. Join us as we shape the future of AI and beyond. Together, we advance your career.

THE ROLE

We are seeking a Principal GenAI Inference Optimization Engineer to join our Models and Applications team. This role focuses on improving performance, efficiency, and scalability of generative AI inference workloads on AMD GPU platforms. You will contribute to optimizing latency, throughput, and cost efficiency for real-world deployment of large-scale models, working across the software-hardware stack.

THE PERSON

The ideal candidate is a strong technical contributor with expertise in GenAI inference optimization, GPU performance, and large-scale serving systems. You have a solid understanding of GPU architecture, memory systems, and communication patterns, and can apply this knowledge to improve inference efficiency. You are comfortable working across multiple layers—from kernels and runtimes to frameworks and serving systems—and can independently drive optimization efforts while collaborating with cross-functional teams.

KEY RESPONSIBILITIES

Optimize performance of GenAI inference workloads on AMD GPU platforms across single-node and distributed environments. Improve latency, throughput, and cost efficiency for LLM and multimodal model serving in production. Analyze and resolve bottlenecks across compute, memory, and communication (e.g., kernel efficiency, KV-cache usage, memory bandwidth, scheduling). Contribute to cross-stack optimizations spanning kernels, runtimes, communication libraries, and inference/serving frameworks (e.g., vLLM, SGLang, Triton, or similar systems). Implement and evaluate inference optimization techniques such as batching strategies, quantization, prefix caching, and speculative decoding. Support development and optimization of scalable serving systems, including request scheduling and resource utilization. Develop and use profiling, benchmarking, and performance analysis tools for inference workloads. Collaborate with hardware, compiler, and framework teams to improve overall system performance. Contribute to internal tools and, where applicable, open-source projects for inference optimization on AMD platforms. Document best practices and contribute to performance guidelines for GenAI deployment.

PREFERRED EXPERIENCE

Strong understanding of GPU architecture and performance fundamentals (compute, memory hierarchy, interconnects such as PCIe/Infinity Fabric/RDMA). Experience with GenAI inference optimization techniques (e.g., quantization, KV-cache optimization, batching). Hands-on experience with inference/serving frameworks such as vLLM, SGLang, Triton, TensorRT-LLM, or similar. Experience working on LLM or multimodal inference workloads. Familiarity with distributed systems and serving architectures. Experience with ML frameworks (PyTorch, JAX, or TensorFlow), especially for inference. Proficiency in Python and at least one systems language (C++/CUDA/HIP). Experience with profiling, debugging, and performance tuning tools. Ability to work collaboratively across teams and deliver impactful optimizations.

ACADEMIC CREDENTIALS

B.S., M.S. or Ph.D. in Computer Science, Computer Engineering, or a related field preferred, or equivalent industry experience.

LOCATION

San Jose, CA

#LI-MV1

#HYBRID

This role is not eligible for visa sponsorship. Benefits offered are described: AMD benefits at a glance. AMD does not accept unsolicited resumes from headhunters, recruitment agencies, or fee-based recruitment services. AMD and its subsidiaries are equal opportunity, inclusive employers and will consider all applicants without regard to age, ancestry, color, marital status, medical condition, mental or physical disability, national origin, race, religion, political and/or third-party affiliation, sex, pregnancy, sexual orientation, gender identity, military or veteran status, or any other characteristic protected by law. We encourage applications from all qualified candidates and will accommodate applicants’ needs under the respective laws throughout all stages of the recruitment and selection process. AMD may use Artificial Intelligence to help screen, assess or select applicants for this position. AMD’s “Responsible AI Policy” is available here. This posting is for an existing vacancy. #J-18808-Ljbffr Advanced Micro Devices

Apply

Vacancy posted 18 hours ago

Similar jobs that could be interesting for youBased on the Principal GenAI Inference Optimization Engineer in San Jose, CA vacancy

Principal Dynamo Systems Engineer — Scalable AI Inference
...leading tech company in California is seeking a Principal Software Engineer for the Dynamo platform, specializing in scalable AI inference in distributed environments. Candidates... ...of distributed systems. The role involves optimizing Kubernetes deployments and improving...
Principal
NVIDIA Corporation
Santa Clara, CA
2 days ago
Principal Software Engineer (AI Inference / Distributed Systems)
...AMD is looking for a strategic software engineering lead who is passionate about improving... ...Able to communicate effectively and work optimally with different teams across AMD. KEY... ...for optimizing scale-up and scale-out inference. Develop methods and tooling to...
Principal
Advanced Micro Devices , Inc.
Santa Clara, CA
18 hours ago
Sr. Multimodal Model Training and Inference Optimization Engineer
$244.8k
...synthesis, intelligent image/video editing, and virtual humans. We are seeking an experienced Multimodal Model Training and Inference Optimization Engineer with expertise in optimizing AI model training and inference, including distributed training/inference and acceleration...
Suggested
Temporary work
Local area
ByteDance
San Jose, CA
1 day ago
Principal Engineer, Solutions Architect Lead - Industrial & Embedded IoT, Edge AI On‑Prem Appliance
$220.2k - $330.4k
...Technologies, Inc. Job Area: Engineering Group, Engineering... ...for generative AI inference and computer vision workloads... ...cloud scenarios. As a Principal Systems Solutions... ...developing innovative genAI and hybridAI solutions... ..., profile, and optimize models and pipelines end...
Principal
Work experience placement
Work at office
Qualcomm
Santa Clara, CA
3 days ago
Senior DL Inference & Performance Engineer
$184k - $356.5k
...technology company in California is seeking a Senior DL Algorithms Engineer to drive inference performance for Deep Learning workloads. The role involves... ...model inference and collaborating with co-design teams to optimize performance across hardware and software interfaces....
Suggested
NVIDIA Corporation
Santa Clara, CA
3 days ago
Senior Compiler Engineer, AI Inference Performance
$152k - $241.5k
...looking for an AI & Deep Learning Compiler Engineer. NVIDIA is hiring software engineers... ...DLC has been the backbone of NVIDIA’s inference engine, spanning across data centers, personal... ...networks and developing compiler optimization algorithms. Collaborating with...
NVIDIA
Santa Clara, CA
3 days ago
Senior LLM Performance Engineer - GPU Inference
$184k - $356.5k
...leading AI computing company in California is seeking a Senior Deep Learning Software Engineer focused on performance optimization of LLM models. You will analyze and enhance LLM inference performance, working in cross-collaborative teams to implement cutting-edge...
Full time
NVIDIA Corporation
Santa Clara, CA
1 day ago
Senior Quantized Inference Engineer - AI Throughput
A leading technology firm is seeking a Senior Software Engineer for Quantized Inference to implement quantized recipes for advanced model optimization. This role demands strong skills in Python and C++, alongside experience in ML accelerators and software engineering fundamentals...
NVIDIA Corporation
Santa Clara, CA
4 days ago
Principal GenAI Engineer — LLMs for Silicon Design
A leading technology firm in San Jose seeks a Principal GenAI Software Development Engineer to automate electronics design through innovative LLM solutions... .... The engineer will collaborate on EDA workflows and optimize design verification processes, ensuring high-quality...
Principal
Micron Memory Malaysia Sdn Bhd
San Jose, CA
18 hours ago
Senior DL Algorithms Engineer - Inference Performance
$184k - $287.5k
Senior DL Algorithms Engineer - Inference Performance page is loaded## Senior DL Algorithms Engineer - Inference Performancelocations: US,... ...senior engineers who are mindful of performance analysis and optimization to help us squeeze every last clock cycle out of Deep...
NVIDIA Corporation
Santa Clara, CA
1 day ago
Principal Engineer - AI Agents and Systems
$272k - $431.25k
...users worldwide. We are looking for a Principal Engineer to serve as a key technical leader in... ...PCs. By combining powerful local inference (Nemotron models) with robust privacy... ...Sandboxing: Guide the engineering efforts to optimize the agent runtimes for Windows. You...
Principal
Local area
Worldwide
NVIDIA
Santa Clara, CA
4 days ago
Senior Performance Engineer, Inference
...deliver industry‑leading training and inference speeds and empowers machine learning users... ...We are hiring a Senior Performance Engineer to join our Product team. You are an expert... ..., TensorRT‑LLM), GPU kernel‑level optimization toolchains (CUDA, Triton), and an intuitive...
Contract work
Shift work
Cerebras
Sunnyvale, CA
1 day ago
Senior GenAI Systems Engineer — Scalable Inference & APIs
$238.7k - $345.65k
A leading technology company in California is seeking a Senior Machine Learning Engineer to join the Generative AI Services team. This high-impact role involves designing and developing generative AI systems integrated into various Adobe products. Candidates should have...
Adobe Systems GmbH
San Jose, CA
18 hours ago
Analog Design Engineer, Sr Staff/Principal
...per week. The role: Analog Design Engineer, Sr Staff/Principal What You Will Do: Analog-mixed... ...Compute engine for Artificial Intelligence Inference Accelerator and High-Speed Die-2-Die... ...nodes from 4nm and below, and to optimize design and layout to achieve low power...
Principal
Full time
3 days per week
D-matrix
Santa Clara, CA
18 hours ago
Principal GenAI Technical Architect
...0/2026 We are seeking an experienced Principal GenAI Technical Architect to lead the design... ...collaboration with customer stakeholders and engineering teams to deliver secure, scalable, and... ...using Python and FastAPI. Build and optimize Generative AI applications leveraging...
Principal
Jansoft Global
San Jose, CA
18 hours ago
MuleSoft Engineer with GenAI
...The Software Engineer (MuleSoft Engineer with GenAI) role is a Contract with a client located in Santa Clara, CA (Remote). Must have: Mulesoft... ...and complex distributed integrations will be vital in optimizing our integration architecture and enhancing the overall...
Contract work
Remote work
InterSources
Santa Clara, CA
1 day ago
Compiler Engineer - AI Inference
$152k - $241.5k
...NVIDIA is seeking top-tier AI Compiler Engineers to drive innovation within our world-class... ...generation and computational graph optimizations for next-generation NVIDIA GPUs. Advance... ...problems for AI workloads (both inference and training) and successfully transition...
NVIDIA
Santa Clara, CA
3 days ago
LLM Algorithmic Optimization Engineer
$143.2k - $186k
...and apply cutting-edge technologies to optimize Large Language Models (LLMs) and multimodal... ..., for highly efficient LLM inference as well as deployment across distributed... ...s degree in Computer Science, Computer Engineering, Applied Mathematics, Communications, Electronics...
Full time
Temporary work
Flexible hours
NIO
San Jose, CA
4 days ago
Principal Software Engineer - AI Inference
$272k - $431.25k
...platform for every new AI-powered application. We seek a Principal Software Engineer - AI Inference to advance open-source LLM serving. This role involves... ...(paging/sharding), memory planning, and streaming. Optimize core hot paths across the stack-from Python...
Principal
Remote work
NVIDIA
Santa Clara, CA
1 day ago
Senior Software Engineer, AI Inference Systems
$184k - $287.5k
...We are seeking highly skilled and motivated software engineers to join us and build AI inference systems that serve large-scale models with extreme efficiency... ...and implement high-performance inference stacks, optimize GPU kernels and compilers, drive industry benchmarks,...
NVIDIA
Santa Clara, CA
3 days ago
Principal / Senior GPU SW Performance Engineer - PostTraining
...and beyond. Together, we advance your career. Principal / Senior GPU Software Performance Engineer - Post-Training THE ROLE: Drive the performance... ..., experiment orchestration, regression triage, and optimization at scale. THE PERSON: The ideal candidate...
Principal
Advanced Micro Devices , Inc.
San Jose, CA
1 day ago
Principal DSP Hardware Engineer - Coherent Optical Systems
Arycs Technologies, Inc. is seeking a Principal Hardware Engineer for Signal Processing to lead the design and validation of advanced DSP algorithms in coherent optical modules. The role involves optimizing signal chains and collaborating across various engineering teams...
Principal
Arycs Technologies, Inc.
Los Gatos, CA
2 days ago
Senior Software Engineer, Deep Learning Inference - Automotive Safety
$152k - $241.5k
...NVIDIA's TensorRT team as a Senior Software Engineer, and be at the forefront of technology, enabling high-performance AI inference solutions for automotive safety and other... ...requirements Contribute to performance optimization and benchmarking efforts for specialized...
NVIDIA
Santa Clara, CA
18 hours ago
Principal GenAI Engagement Lead, Partner Platforms
$272k - $431.25k
...a hands‑on, highly technical Principal Partner Engagement Lead to drive... ...of robust, scalable GenAI solutions that redefine enterprise... ...architectures that bring RAG, LLM inference, and Multi‑Agent workflows to... ...closely with NVIDIA Product, Engineering, Research, Solution...
Principal
NVIDIA
Santa Clara, CA
18 hours ago
Inference Engineer
Senior / Principal Machine Learning Engineer - Inference Serving Frameworks Full-time | On-site | Bay Area About the Company We are a VC-backed, stealth-mode... ...highly skilled engineers who can help architect and optimize large‑scale inference systems across software,...
Full time
Acceler8 Talent
Santa Clara, CA
2 days ago
Principal Industrial Engineer: Digital Transformation
$103.6k - $155.4k
Northrop Grumman Corp. (JP) in Sunnyvale, CA is looking for a Principal Industrial Engineer. The role involves utilizing data analytics to optimize operational processes, leading digital transformation efforts, and creating metrics for performance monitoring. Candidates...
Principal
Northrop Grumman Corp. (JP)
Sunnyvale, CA
1 day ago
Principal Engineer, Hardware Development Engineering
...generation flash memory platforms for AI inference , where performance, efficiency, and... ...architecture , solve first-of-its-kind engineering problems , and directly influence how... ...architecture, and platform teams to optimize real-world AI workloads Develop and...
Principal
Temporary work
Remote work
Flexible hours
Shift work
Sandisk
Milpitas, CA
4 days ago
Principal / Sr. Engineer Metrology CMM
$103.6k - $155.4k
...history. Northrop Grumman Mission Systems is looking for a Principal / Sr. Principal Metrology Engineer CMM located in Sunnyvale, CA . What you'll get to do:... ...daily metrology workflow to drive efficiency and optimal resource utilization. Develop solutions to complex...
Principal
Relocation package
Shift work
Northrop Grumman Corp. (JP)
Sunnyvale, CA
2 days ago
Principal PMIC Architect: AI-Optimized Power Delivery
$175k - $350k
...seeking a Power Architecture Lead for PMIC in San Jose, CA. In this role, you will define and oversee the IVR architecture, ensure optimal performance across power domain specifications, and collaborate with foundry partners. Qualified candidates should possess over 15...
Principal
TylSemi
San Jose, CA
3 days ago
Principal Analog Design Engineer
$175k - $350k
Role Overview As an Analog Design Engineer at TylSemi, you will design and deliver high-performance analog and mixed-signal circuits that... ...transistor‑level design, simulation, and sign‑off. Design and optimize analog/mixed‑signal circuits such as amplifiers, references,...
Principal
TylSemi, Inc.
San Jose, CA
4 days ago

Do you want to receive more vacancies?

Subscribe and receive similar vacancies to Principal GenAI Inference Optimization Engineer. Be the first to apply!