Principal Software Engineer (AI Inference / Distributed Systems)

Advanced Micro Devices , Inc.

WHAT YOU DO AT AMD CHANGES EVERYTHING

At AMD, our mission is to build great products that accelerate next-generation computing experiences-from AI and data centers, to PCs, gaming and embedded systems. Grounded in a culture of innovation and collaboration, we believe real progress comes from bold ideas, human ingenuity and a shared passion to create something extraordinary. When you join AMD, you'll discover the real differentiator is our culture. We push the limits of innovation to solve the world's most important challenges-striving for execution excellence, while being direct, humble, collaborative, and inclusive of diverse perspectives. Join us as we shape the future of AI and beyond. Together, we advance your career.

THE ROLE:

AMD is looking for a strategic software engineering lead who is passionate about improving the performance of key applications and benchmarks . You will be a member of a core team of incredibly talented industry specialists and will work with the very latest hardware and software technology.

THE PERSON:

The ideal candidate should be passionate about software engineering and possess leadership skills to drive sophisticated issues to resolution. Able to communicate effectively and work optimally with different teams across AMD.

KEY RESPONSIBILITIES:

Develop techniques for optimizing scale-up and scale-out inference.
Develop methods and tooling to utilize dynamic resources in service of inference
Support proliferation of rocm ecosystem.

PREFERRED EXPERIENCE:

Expertise in the K8s ecosystem, especially as it pertains to large scale inference
Operational experience with at least one of sglang, or vllm and with kserve, llm-d. Experience running inference as a service can be substituted in-lieu of experience with frameworks such as kserve or llm-d.
Expertise with techniques used to optimize inference like distributed kv-cache, disaggregation, request scheduling etc

Ability to write high quality code with a keen attention to detail. Preferred languages are go and python.
Experience with modern concurrent programming
Effective communicator with keen attention to detail.

Prior experience roadmapping deeply technical areas is highly valuable.

ACADEMIC CREDENTIALS:

Bachelor's or Master's degree in Computer Science, Computer Engineering, Electrical Engineering, or equivalent

This role is not eligible for visa sponsorship.

#LI-G11

#LI-HYBRID

Benefits offered are described: AMD benefits at a glance.

AMD does not accept unsolicited resumes from headhunters, recruitment agencies, or fee-based recruitment services. AMD and its subsidiaries are equal opportunity, inclusive employers and will consider all applicants without regard to age, ancestry, color, marital status, medical condition, mental or physical disability, national origin, race, religion, political and/or third-party affiliation, sex, pregnancy, sexual orientation, gender identity, military or veteran status, or any other characteristic protected by law. We encourage applications from all qualified candidates and will accommodate applicants' needs under the respective laws throughout all stages of the recruitment and selection process.

AMD may use Artificial Intelligence to help screen, assess or select applicants for this position. AMD's "Responsible AI Policy" is available here.

This posting is for an existing vacancy.

Apply

Vacancy posted 2 days ago

Similar jobs that could be interesting for youBased on the Principal Software Engineer (AI Inference / Distributed Systems) in Santa Clara, CA vacancy

Principal Software Engineer - Large-Scale LLM Memory and Storage Systems
$272k - $425.5k
Principal Software Engineer – Large-Scale LLM Memory and Storage Systems page is loaded## Principal Software Engineer – Large... ...-throughput, low-latency inference framework for serving generative AI and reasoning models across multi-node distributed environments. Built in...
Suggested
Local area
Remote work
NVIDIA Corporation
Santa Clara, CA
3 days ago
Principal Software Engineer - AI Inference
$272k - $431.25k
...the platform for every new AI-powered application. We seek a Principal Software Engineer - AI Inference to advance open-source LLM serving... ...on NVIDIA GPUs and systems. You will also strengthen the... ...performance engineering, and distributed systems. You will collaborate...
Suggested
NVIDIA Gruppe
Santa Clara, CA
5 days ago
Principal Software Engineer I - Distributed Systems - Elasticsearch
$2,000 per month
...Elastic, the Search AI Company, enables everyone to find the answers they need in real time, using... ...is The Role: We are on the lookout for a Principal Software Engineer I to join our Elasticsearch - Distributed Systems team and focus on how Elasticsearch provides...
Suggested
Local area
Flexible hours
Elastic
Mountain View, CA
1 day ago
Distributed Systems Cluster Security Software - Engineering Lead
$140k - $240k
...Cerebras Systems builds the world's largest AI chip, 56 times larger than... ...leading training and inference speeds and... ...security-first based engineering. Cerebras cluster... ...cluster management software stack - all the way... ...management role in distributed systems security....
Suggested
CEREBRAS SYSTEMS INC.
Sunnyvale, CA
3 days ago
Senior Software Engineer, AI Inference Systems
$184k - $287.5k
...Overview We are seeking highly skilled and motivated software engineers to join us and build AI inference systems that serve large-scale models with extreme... ...systems, computer architecture, parallel programming, distributed systems, deep learning theories. Knowledgeable...
Suggested
NVIDIA Gruppe
Santa Clara, CA
5 days ago
Principal AI/ML System Software Engineer
...Principal AI/ML System Software Engineer At d-Matrix, we are focused on unleashing the potential of generative... ...tools Experience with distributed, high-performance software design... ...related fields Experience with inference servers/model serving frameworks (...
Work experience placement
3 days per week
d-Matrix
Santa Clara, CA
3 days ago
Principal AI Inference Systems Engineer
...experiences-from AI and data centers,... ...gaming and embedded systems. Grounded in a culture... ...Staff AI Infra Engineer who is passionate... ...intersection of hardware and software to optimize... ...LLM training and inference on AMD GPUs,... ...ML infrastructure, distributed systems, or performance...
Advanced Micro Devices , Inc.
Santa Clara, CA
1 day ago
Principal Software Engineer - CoreAI Model Inference & Serving
$142.8k - $274.8k
...enterprises. Ourconverged AI fabricdelivers inference capabilities for all LLMs inMicrosoft... ..., Llama, and more. As a Principal Software Engineer , you will shape the... ...and shipping core serving systems, smart routing, and request distribution for a broad portfolio of LLMs...
Ongoing contract
Local area
Microsoft Corporation
Mountain View, CA
1 day ago
Senior Staff Software Engineer - High Performance GPU Inference Systems
$248.71k - $292.6k
...Groq Groq delivers fast, efficient AI inference. Our LPU-based system powers GroqCloud™, giving... ...is possible. Build fast. Sr. Staff Software Engineer - High Performance GPU Inference Systems... ...Responsibilities & opportunities in this role Distributed Systems Engineering : Design and...
I did my part and supported the Regular Toilet
Palo Alto, CA
5 days ago
Principal Software Engineer, Systems/Solutions Test
...Principal Software Engineer, Systems/Solutions Test This role has been designed as 'Hybrid' with an expectation... .... Champion adoption of AI-assisted testing workflows, including... ...Demonstrated excellence in debugging complex distributed/network failures and driving closure...
Work at office
2 days per week
Hewlett Packard Enterprise
Sunnyvale, CA
1 day ago
Staff Software Engineer - Rust and Distributed Systems (Bay Area, hybrid)
...cryptography, encryption, and confidential AI solutions. As data breaches... ...Requirements We’re looking for a Staff Software Engineer to join our Confidential Computing... ...core platform services powering secure, distributed systems at scale. This is a high-impact,...
H1b
Worldwide
Fortanix
Santa Clara, CA
4 days ago
Senior System Software Engineer — GPU AI Inference (Triton)
NVIDIA Gruppe is seeking a Senior System Software Engineer in Santa Clara, California, to develop world-class GPU-accelerated AI inference serving software. This role involves contributing... ...skills, and a strong understanding of distributed systems. The position offers a...
NVIDIA Gruppe
Santa Clara, CA
5 days ago
Sr. AI Inference Systems Engineer
$120.1k - $225.7k
...Entails End-to-End Inference Optimization: Lead... ...and load imbalance in distributed inference.... ...members to build a robust AI inference technical ecosystem... ...Science, Electronic Engineering, AI, or related fields... ...Intelligent Routing . Systems Proficiency: Expert...
Relocation package
Tencent
Palo Alto, CA
3 days ago
Principal Staff Software Engineer - Systems and Infrastructure
$226k - $369k
...part of our world-class software engineering team, you will take... ..., best-in-class AI/ML infrastructure, Kubernetes... ...use your passion for distributed technologies and... ...algorithms, API design and systems design, and your... ...our company. As a Principal Staff Software Engineer...
For contractors
Work at office
Flexible hours
LinkedIn
Sunnyvale, CA
4 days ago
Software Engineer, Distributed Data Systems (US)
$215k - $250k
...Onehouse Data Infrastructure Engineer Onehouse is a mission-... ...traditional analytics to real-time AI / ML). We are a team of... ...created large-scale data systems and globally distributed platforms that sit at the... ...tech stack by building the software and data features that...
Odd job
Work at office
Local area
Remote work
Relocation
Relocation package
OneHouse LLC
Sunnyvale, CA
4 days ago
Senior AI Systems Engineer: Inference Kernels & Runtimes
$184k - $287.5k
NVIDIA Gruppe is seeking talented AI systems engineers to advance innovative technologies in AI inference systems software. This role involves developing cutting-edge libraries, code generators, and kernel technologies for NVIDIA's architecture, emphasizing high-impact...
NVIDIA Gruppe
Santa Clara, CA
5 days ago
Senior AI Infra Engineer - Large Model Inference Systems (Multimodal/LLM/VLM)
$212.8k
...Senior AI Infra Engineer - Large Model Inference Systems (Multimodal/LLM/VLM) Location: San Jose Employment Type... ...high-performance foundation for distributed serving, heterogeneous scheduling... ...or above in Computer Science, Software Engineering, Artificial Intelligence...
Temporary work
Local area
Tik Tok
San Jose, CA
2 days ago
Senior Software Development Engineer - SGLang and Inference Stack
...experiences-from AI and data centers,... ...gaming and embedded systems. Grounded in a culture... ...and Multimodal inference at scale across... ...across internal GPU software teams and engage with... ...Skilled engineer with strong technical... ...training. ~ Distributed System Optimization...
Advanced Micro Devices , Inc.
Santa Clara, CA
3 days ago
Senior Software Engineer, Inference
$152k - $204k
...Senior Software Engineer, Inference Sunnyvale, CA / Bellevue, WA CoreWeave is The Essential Cloud for AI™. Built for pioneers by pioneers, CoreWeave delivers a platform of... ...8 years industry experience building distributed systems or cloud services. ~ Strong coding...
Permanent employment
Temporary work
Casual work
Work at office
Flexible hours
Shift work
CoreWeave
Sunnyvale, CA
4 days ago
Senior Software Engineer I, Inference
$139k - $204k
...Senior Software Engineer I, Inference Sunnyvale, CA / Bellevue, WA CoreWeave is The Essential Cloud for AI™. Built for pioneers by pioneers, CoreWeave delivers a platform... ...years industry experience building distributed systems or cloud services. Computer Science...
Permanent employment
Temporary work
Casual work
Work at office
Remote work
Flexible hours
Shift work
CoreWeave
Sunnyvale, CA
4 days ago
Software Engineer, TT-Distributed
$100k
...Software Engineer, TT-Distributed Tenstorrent is leading the industry on cutting-edge AI technology, revolutionizing performance expectations... ...optimize distributed software systems that power the most... ...state-of-the-art distributed inference and training infrastructure...
Permanent employment
Tenstorrent
Santa Clara, CA
4 days ago
Senior Software Development Engineer - LLM Inference Framework
...experiences-from AI and data centers,... ...gaming and embedded systems. Grounded in a culture... ...member of the LLM inference framework team,... ...single-node and distributed inference runtimes... ...intersection of inference engines, distributed... ...Software Engineering ~ Expertise...
Advanced Micro Devices , Inc.
Santa Clara, CA
3 days ago
Distributed Software Engineer
...Distributed Software Engineer Bengaluru, Karnataka, India; Sunnyvale CA or Toronto Canada Cerebras Systems builds the world's largest AI chip, 56 times larger than GPUs. Our novel wafer-scale architecture... ...-leading training and inference speeds and empowers machine...
CEREBRAS SYSTEMS INC.
Sunnyvale, CA
4 days ago
Senior AI Inference Systems Engineer: GPU-Optimized, Cloud
$184k - $356.5k
NVIDIA Gruppe is looking for skilled software engineers to develop AI inference systems that operate with high efficiency. The role involves architecting high-performance inference frameworks and optimizing GPU processes. Ideal candidates should have extensive programming...
NVIDIA Gruppe
Santa Clara, CA
5 days ago
Senior Software Engineer - AI Inference
$152k - $241.5k
...platform upon which every new AI‑powered application is... .... We are seeking a Senior Software Engineer - AI Inference to advance open‑source LLM... ...‑class on NVIDIA GPUs and systems-and by improving the underlying... ...mindset. Familiarity with distributed systems concepts and...
NVIDIA Gruppe
Santa Clara, CA
5 days ago
Staff Software Engineer, Inference
$188k - $275k
...Staff Software Engineer, Inference CoreWeave is The Essential Cloud for AI™. Built for pioneers by pioneers, CoreWeave delivers... ...GPU resource management, and system-wide optimizations that drive... ...scale. You will work deeply in distributed systems and Kubernetes-based...
Permanent employment
Temporary work
Casual work
Work at office
Flexible hours
CoreWeave
Sunnyvale, CA
9 hours ago
Principal Software Engineer
...experiences—from AI and data... ..., and embedded systems. Grounded in a... ...THE ROLE: As a Principal AI Infrastructure Solution Engineer, you will partner... ...with AMD’s AI software teams and customers... ...LLM training and inference on AMD Instinct... ...‑native distributed training, including...
Advanced Micro Devices
Santa Clara, CA
1 day ago
(USA) Principal, Software Engineer
...performance degrades or systems fail, the impact is... ...that using agentic AI. As a Principal Engineer in Performance and... ...adopt Optimize LLM inference at scale through prompt... ...and operating distributed systems at scale Proven... ...systems, software engineering, or related...
Full time
Temporary work
Part time
Local area
Immediate start
Home office
Flexible hours
Walmart
Sunnyvale, CA
4 days ago
Senior System Software Engineer - Dynamo-Triton Inference Server
We are looking for a Senior System Software Engineer to work on Dynamo-Triton Inference Server. NVIDIA is hiring software engineers... ...GPUs to power a revolution in AI, enabling breakthroughs in... ...design. Experience with high‑scale distributed systems and ML systems. Strong...
NVIDIA Gruppe
Santa Clara, CA
5 days ago
AI Inference Performance Engineer
$152k - $241.5k
...and benchmark GenAI inference on NVIDIA's latest... ...GPU performance engineering and public accountability... ...management, and distributed inference across... ...other emerging AI use cases. Collaborate... ...years of relevant software development... ...high-performance systems. Deep understanding...
NVIDIA Gruppe
Santa Clara, CA
5 days ago

Do you want to receive more vacancies?

Subscribe and receive similar vacancies to Principal Software Engineer (AI Inference / Distributed Systems). Be the first to apply!