Sign up to access all features of our service.
  • Job search
  • Favorites
  • Create a CV
    New
  • Salaries
  • Subscriptions

Principal GenAI Inference Optimization Engineer

Advanced Micro Devices , Inc.

WHAT YOU DO AT AMD CHANGES EVERYTHING At AMD, our mission is to build great products that accelerate next-generation computing experiences—from AI and data centers, to PCs, gaming and embedded systems. Grounded in a culture of innovation and collaboration, we believe real progress comes from bold ideas, human ingenuity and a shared passion to create something extraordinary. When you join AMD, you’ll discover the real differentiator is our culture. We push the limits of innovation to solve the world’s most important challenges—striving for execution excellence, while being direct, humble, collaborative, and inclusive of diverse perspectives. Join us as we shape the future of AI and beyond. Together, we advance your career. THE ROLE We are seeking a Principal GenAI Inference Optimization Engineer to join our Models and Applications team. This role focuses on improving performance, efficiency, and scalability of generative AI inference workloads on AMD GPU platforms. You will contribute to optimizing latency, throughput, and cost efficiency for real-world deployment of large-scale models, working across the software-hardware stack. THE PERSON The ideal candidate is a strong technical contributor with expertise in GenAI inference optimization, GPU performance, and large-scale serving systems. You have a solid understanding of GPU architecture, memory systems, and communication patterns, and can apply this knowledge to improve inference efficiency. You are comfortable working across multiple layers—from kernels and runtimes to frameworks and serving systems—and can independently drive optimization efforts while collaborating with cross-functional teams. KEY RESPONSIBILITIES Optimize performance of GenAI inference workloads on AMD GPU platforms across single-node and distributed environments. Improve latency, throughput, and cost efficiency for LLM and multimodal model serving in production. Analyze and resolve bottlenecks across compute, memory, and communication (e.g., kernel efficiency, KV-cache usage, memory bandwidth, scheduling). Contribute to cross-stack optimizations spanning kernels, runtimes, communication libraries, and inference/serving frameworks (e.g., vLLM, SGLang, Triton, or similar systems). Implement and evaluate inference optimization techniques such as batching strategies, quantization, prefix caching, and speculative decoding. Support development and optimization of scalable serving systems, including request scheduling and resource utilization. Develop and use profiling, benchmarking, and performance analysis tools for inference workloads. Collaborate with hardware, compiler, and framework teams to improve overall system performance. Contribute to internal tools and, where applicable, open-source projects for inference optimization on AMD platforms. Document best practices and contribute to performance guidelines for GenAI deployment. PREFERRED EXPERIENCE Strong understanding of GPU architecture and performance fundamentals (compute, memory hierarchy, interconnects such as PCIe/Infinity Fabric/RDMA). Experience with GenAI inference optimization techniques (e.g., quantization, KV-cache optimization, batching). Hands-on experience with inference/serving frameworks such as vLLM, SGLang, Triton, TensorRT-LLM, or similar. Experience working on LLM or multimodal inference workloads. Familiarity with distributed systems and serving architectures. Experience with ML frameworks (PyTorch, JAX, or TensorFlow), especially for inference. Proficiency in Python and at least one systems language (C++/CUDA/HIP). Experience with profiling, debugging, and performance tuning tools. Ability to work collaboratively across teams and deliver impactful optimizations. ACADEMIC CREDENTIALS B.S., M.S. or Ph.D. in Computer Science, Computer Engineering, or a related field preferred, or equivalent industry experience. LOCATION San Jose, CA #LI-MV1 #HYBRID This role is not eligible for visa sponsorship. Benefits offered are described: AMD benefits at a glance. AMD does not accept unsolicited resumes from headhunters, recruitment agencies, or fee-based recruitment services. AMD and its subsidiaries are equal opportunity, inclusive employers and will consider all applicants without regard to age, ancestry, color, marital status, medical condition, mental or physical disability, national origin, race, religion, political and/or third-party affiliation, sex, pregnancy, sexual orientation, gender identity, military or veteran status, or any other characteristic protected by law. We encourage applications from all qualified candidates and will accommodate applicants’ needs under the respective laws throughout all stages of the recruitment and selection process. AMD may use Artificial Intelligence to help screen, assess or select applicants for this position. AMD’s “Responsible AI Policy” is available here. This posting is for an existing vacancy. #J-18808-Ljbffr

Vacancy posted 7 hours ago
Similar jobs that could be interesting for youBased on the Principal GenAI Inference Optimization Engineer in San Jose, CA vacancy
  •  ...AMD is looking for a strategic software engineering lead who is passionate about improving...  ...Able to communicate effectively and work optimally with different teams across AMD. KEY...  ...for optimizing scale-up and scale-out inference. Develop methods and tooling to utilize... 
    Principal

    Advanced Micro Devices , Inc.

    Santa Clara, CA
    6 hours ago
  • NVIDIA is seeking a Senior DL Algorithms Engineer to optimize LLM/Omni models and enhance performance across its software stack. The ideal...  ...and 3+ years of experience in deep learning, specifically in inference. This role involves profiling, analyzing bottlenecks, and... 
    Suggested

    NVIDIA

    Santa Clara, CA
    4 days ago
  •  ...is looking for a Senior Staff AI Infra Engineer who is passionate about improving the performance...  ...of hardware and software to optimize performance for next-generation AI applications...  ...and accelerate LLM training and inference on AMD GPUs, improving kernel, communication... 
    Principal

    Advanced Micro Devices , Inc.

    Santa Clara, CA
    5 days ago
  • $244.8k

     ...synthesis, intelligent image/video editing, and virtual humans. We are seeking an experienced Multimodal Model Training and Inference Optimization Engineer with expertise in optimizing AI model training and inference, including distributed training/inference and acceleration... 
    Suggested
    Temporary work
    Local area

    ByteDance

    San Jose, CA
    2 days ago
  • $184k - $356.5k

    NVIDIA Gruppe is looking for skilled software engineers to develop AI inference systems that operate with high efficiency. The role involves architecting high-performance inference frameworks and optimizing GPU processes. Ideal candidates should have extensive programming... 
    Suggested

    NVIDIA Gruppe

    Santa Clara, CA
    4 days ago
  • $182k - $273k

     ...: Ampere Computing is seeking a skilled CPU and SoC Power Lead to join our Silicon Engineering team. In this role, you will focus on power modeling, analysis, and optimization to help design high-performance, power-efficient CPUs and SoCs. You will collaborate across... 
    Principal
    Local area

    Ampere

    Santa Clara, CA
    1 day ago
  •  ...NVIDIA Gruppe is looking for a skilled professional to enhance the performance of large-scale models through advanced optimization techniques in Santa Clara, California. Candidates should have a strong background in DL model training and deployment, ideally with a PhD... 

    NVIDIA Gruppe

    Santa Clara, CA
    1 day ago
  • $220.2k - $330.4k

     ...Technologies, Inc. Job Area: Engineering Group, Engineering...  ...for generative AI inference and computer vision workloads...  ...cloud scenarios. As a Principal Systems Solutions...  ...developing innovative genAI and hybridAI solutions...  ..., profile, and optimize models and pipelines end... 
    Principal
    Work experience placement
    Work at office

    Qualcomm

    Santa Clara, CA
    4 days ago
  • $184k - $356.5k

     ...NVIDIA Gruppe is looking for a Senior Software Engineer specializing in Deep Learning Inference in Santa Clara, California. You will design and optimize GPU-accelerated software critical for advanced AI applications, contributing to libraries like vLLM and SGLang. Ideal... 

    NVIDIA Gruppe

    Santa Clara, CA
    7 hours ago
  •  ...A leading technology firm is seeking a Senior Software Engineer for Quantized Inference to implement quantized recipes for advanced model optimization. This role demands strong skills in Python and C++, alongside experience in ML accelerators and software engineering fundamentals... 

    NVIDIA

    Santa Clara, CA
    1 day ago
  •  ...NVIDIA Gruppe is looking for a skilled engineer to join their TensorRT Edge-LLM team in Santa Clara, California...  .... The role involves developing a state-of-the-art inference framework for large language models and optimizing it for real-time performance on embedded platforms... 

    NVIDIA Gruppe

    Santa Clara, CA
    7 hours ago
  •  ...Advanced Micro Devices in Santa Clara seeks a Senior ML Engineer focused on optimizing large language model inference runtimes. The role involves architecting distributed systems and enhancing performance across GPUs. Ideal candidates will have expertise in Python and... 

    Advanced Micro Devices , Inc.

    Santa Clara, CA
    1 day ago
  • $272k - $431.25k

     ...NVIDIA Gruppe is looking for a Principal Software Engineer to advance open-source AI inference. This hands-on role emphasizes running high-performance inference...  ...across various teams. Key responsibilities include optimizing inference runtimes, improving efficiency, and... 
    Principal

    NVIDIA Gruppe

    Santa Clara, CA
    7 hours ago
  • $184k - $356.5k

     ...technology company in California is seeking a Senior DL Algorithms Engineer to drive inference performance for Deep Learning workloads. The role involves...  ...model inference and collaborating with co-design teams to optimize performance across hardware and software interfaces.... 

    NVIDIA Corporation

    Santa Clara, CA
    4 days ago
  • $184k - $356.5k

     ...leading AI computing company in California is seeking a Senior Deep Learning Software Engineer focused on performance optimization of LLM models. You will analyze and enhance LLM inference performance, working in cross-collaborative teams to implement cutting-edge... 
    Full time

    NVIDIA Corporation

    Santa Clara, CA
    2 days ago
  • NVIDIA Corporation in Santa Clara seeks a Principal Software Engineer - AI Inference to advance open-source LLM serving. This hands-on role focuses on optimizing inference engines like vLLM and SGLang for NVIDIA GPUs, requiring deep technical skill and collaboration across... 
    Principal

    NVIDIA Corporation

    Santa Clara, CA
    3 days ago
  • $152k - $241.5k

     ...looking for an AI & Deep Learning Compiler Engineer. NVIDIA is hiring software engineers...  ...DLC has been the backbone of NVIDIA’s inference engine, spanning across data centers, personal...  ...networks and developing compiler optimization algorithms. Collaborating with... 

    NVIDIA

    Santa Clara, CA
    4 days ago
  • $255.85k - $361.2k

    Job Overview We are seeking a Principal Engineer to define and architect the next generation of...  ...focuses on dynamically executing and optimizing large‑scale AI computation graphs across...  ...a PhD. Experience with AI/ML systems, inference infrastructure, or large‑scale model... 
    Principal
    Local area
    Shift work

    Intel Corporation

    Santa Clara, CA
    4 days ago
  • $152k - $241.5k

    We are looking for a Senior DL Algorithms Engineer for LLM/Omni model optimizations! Seeking senior engineers who are mindful of performance analysis...  ...models (like Nemotron and Cosmos) on NVIDIA’s accelerated inference SW stack. Contribute new features, fix bugs and... 

    NVIDIA

    Santa Clara, CA
    4 days ago
  • $272k - $431.25k

     ...users worldwide. We are looking for a Principal Engineer to serve as a key technical leader in...  ...PCs. By combining powerful local inference (Nemotron models) with robust privacy...  ...Sandboxing: Guide the engineering efforts to optimize the agent runtimes for Windows. You... 
    Principal
    Local area
    Worldwide

    NVIDIA

    Santa Clara, CA
    5 days ago
  • $152k - $287.5k

     ...Gruppe is seeking a Senior Machine Learning Applications and Compiler Engineer in Santa Clara, California. This role involves developing algorithms for their LPX inference and compiler stack, optimizing the performance of neural network workloads on NVIDIA platforms.... 

    NVIDIA Gruppe

    Santa Clara, CA
    7 hours ago
  •  ...deliver industry‑leading training and inference speeds and empowers machine learning users...  ..., TensorRT‑LLM), GPU kernel‑level optimization toolchains (CUDA, Triton), and an intuitive...  .... Collaborate with Product and Engineering to identify where competitors are closing... 
    Contract work
    Shift work

    Cerebras

    Sunnyvale, CA
    2 days ago
  •  ...20/2026 We are seeking an experienced Principal GenAI Technical Architect to lead the design...  ...collaboration with customer stakeholders and engineering teams to deliver secure, scalable, and...  ...using Python and FastAPI. Build and optimize Generative AI applications leveraging... 
    Principal

    Jansoft Global

    San Jose, CA
    7 hours ago
  •  ...Advanced Micro Devices is seeking a strategic software engineering lead in Santa Clara, California. This role involves improving...  ...software. Key responsibilities include developing techniques for inference optimization and supporting the ROCm ecosystem expansion. A Bachelor’s... 

    Advanced Micro Devices , Inc.

    Santa Clara, CA
    7 hours ago
  • $272k - $431.25k

     ...a hands‑on, highly technical Principal Partner Engagement Lead to drive...  ...of robust, scalable GenAI solutions that redefine enterprise...  ...architectures that bring RAG, LLM inference, and Multi‑Agent workflows to...  ...closely with NVIDIA Product, Engineering, Research, Solution... 
    Principal

    NVIDIA Gruppe

    Santa Clara, CA
    6 hours ago
  • $272k - $431.25k

     ...platform for every new AI-powered application. We seek a Principal Software Engineer - AI Inference to advance open-source LLM serving. This role involves...  ...(paging/sharding), memory planning, and streaming. Optimize core hot paths across the stack—from Python orchestration... 
    Principal

    NVIDIA Gruppe

    Santa Clara, CA
    6 hours ago
  •  ...The Software Engineer (MuleSoft Engineer with GenAI) role is a Contract with a client located in Santa Clara, CA (Remote). Must have: Mulesoft...  ...and complex distributed integrations will be vital in optimizing our integration architecture and enhancing the overall... 
    Contract work
    Remote work

    InterSources

    Santa Clara, CA
    2 days ago
  • $143.2k - $186k

     ...and apply cutting‑edge technologies to optimize large language models (LLMs) and multimodal...  ...architectures for efficient LLM inference as well as deployment across distributed...  ...s degree in Computer Science, Computer Engineering, Applied Mathematics, Communications, Electronics... 
    Full time
    Temporary work
    Flexible hours

    1600 NIO USA, Inc.

    San Jose, CA
    7 hours ago
  • $152k - $241.5k

     ...NVIDIA is seeking top-tier AI Compiler Engineers to drive innovation within our world-class...  ...generation and computational graph optimizations for next-generation NVIDIA GPUs. Advance...  ...problems for AI workloads (both inference and training) and successfully transition... 

    NVIDIA

    Santa Clara, CA
    4 days ago
  • $143.2k - $186k

     ...and apply cutting-edge technologies to optimize Large Language Models (LLMs) and multimodal...  ..., for highly efficient LLM inference as well as deployment across distributed...  ...s degree in Computer Science, Computer Engineering, Applied Mathematics, Communications, Electronics... 
    Full time
    Temporary work
    Flexible hours

    NIO

    San Jose, CA
    5 days ago

Do you want to receive more vacancies?

Subscribe and receive similar vacancies to Principal GenAI Inference Optimization Engineer. Be the first to apply!