Sign up to access all features of our service.
  • Job search
  • Favorites
  • Create a CV
    New
  • Salaries
  • Subscriptions

Principal Software Engineer (AI Inference / Distributed Systems)

Advanced Micro Devices , Inc.

WHAT YOU DO AT AMD CHANGES EVERYTHING

At AMD, our mission is to build great products that accelerate next-generation computing experiences-from AI and data centers, to PCs, gaming and embedded systems. Grounded in a culture of innovation and collaboration, we believe real progress comes from bold ideas, human ingenuity and a shared passion to create something extraordinary. When you join AMD, you'll discover the real differentiator is our culture. We push the limits of innovation to solve the world's most important challenges-striving for execution excellence, while being direct, humble, collaborative, and inclusive of diverse perspectives. Join us as we shape the future of AI and beyond. Together, we advance your career.

THE ROLE:

AMD is looking for a strategic software engineering lead who is passionate about improving the performance of key applications and benchmarks . You will be a member of a core team of incredibly talented industry specialists and will work with the very latest hardware and software technology.

THE PERSON:

The ideal candidate should be passionate about software engineering and possess leadership skills to drive sophisticated issues to resolution. Able to communicate effectively and work optimally with different teams across AMD.

KEY RESPONSIBILITIES:

  • Develop techniques for optimizing scale-up and scale-out inference.
  • Develop methods and tooling to utilize dynamic resources in service of inference
  • Support proliferation of rocm ecosystem.

PREFERRED EXPERIENCE:

  • Expertise in the K8s ecosystem, especially as it pertains to large scale inference
  • Operational experience with at least one of sglang, or vllm and with kserve, llm-d. Experience running inference as a service can be substituted in-lieu of experience with frameworks such as kserve or llm-d.
  • Expertise with techniques used to optimize inference like distributed kv-cache, disaggregation, request scheduling etc
  • Ability to write high quality code with a keen attention to detail. Preferred languages are go and python.
  • Experience with modern concurrent programming
  • Effective communicator with keen attention to detail.
  • Prior experience roadmapping deeply technical areas is highly valuable.

ACADEMIC CREDENTIALS:

  • Bachelor's or Master's degree in Computer Science, Computer Engineering, Electrical Engineering, or equivalent

This role is not eligible for visa sponsorship.

#LI-G11

#LI-HYBRID

Benefits offered are described: AMD benefits at a glance.

AMD does not accept unsolicited resumes from headhunters, recruitment agencies, or fee-based recruitment services. AMD and its subsidiaries are equal opportunity, inclusive employers and will consider all applicants without regard to age, ancestry, color, marital status, medical condition, mental or physical disability, national origin, race, religion, political and/or third-party affiliation, sex, pregnancy, sexual orientation, gender identity, military or veteran status, or any other characteristic protected by law. We encourage applications from all qualified candidates and will accommodate applicants' needs under the respective laws throughout all stages of the recruitment and selection process.

AMD may use Artificial Intelligence to help screen, assess or select applicants for this position. AMD's "Responsible AI Policy" is available here.

This posting is for an existing vacancy.

Vacancy posted 2 days ago
Similar jobs that could be interesting for youBased on the Principal Software Engineer (AI Inference / Distributed Systems) in Santa Clara, CA vacancy
  • $272k - $425.5k

    Principal Software Engineer – Large-Scale LLM Memory and Storage Systems page is loaded## Principal Software Engineer – Large...  ...-throughput, low-latency inference framework for serving generative AI and reasoning models across multi-node distributed environments. Built in... 
    Suggested
    Local area
    Remote work

    NVIDIA Corporation

    Santa Clara, CA
    3 days ago
  • $272k - $431.25k

     ...the platform for every new AI-powered application. We seek a Principal Software Engineer - AI Inference to advance open-source LLM serving...  ...on NVIDIA GPUs and systems. You will also strengthen the...  ...performance engineering, and distributed systems. You will collaborate... 
    Suggested

    NVIDIA Gruppe

    Santa Clara, CA
    5 days ago
  • $2,000 per month

     ...Elastic, the Search AI Company, enables everyone to find the answers they need in real time, using...  ...is The Role: We are on the lookout for a Principal Software Engineer I to join our Elasticsearch - Distributed Systems team and focus on how Elasticsearch provides... 
    Suggested
    Local area
    Flexible hours

    Elastic

    Mountain View, CA
    1 day ago
  • $140k - $240k

     ...Cerebras Systems builds the world's largest AI chip, 56 times larger than...  ...leading training and inference speeds and...  ...security-first based engineering. Cerebras cluster...  ...cluster management software stack - all the way...  ...management role in distributed systems security.... 
    Suggested

    CEREBRAS SYSTEMS INC.

    Sunnyvale, CA
    3 days ago
  • $184k - $287.5k

     ...Overview We are seeking highly skilled and motivated software engineers to join us and build AI inference systems that serve large-scale models with extreme...  ...systems, computer architecture, parallel programming, distributed systems, deep learning theories. Knowledgeable... 
    Suggested

    NVIDIA Gruppe

    Santa Clara, CA
    5 days ago
  •  ...Principal AI/ML System Software Engineer At d-Matrix, we are focused on unleashing the potential of generative...  ...tools Experience with distributed, high-performance software design...  ...related fields Experience with inference servers/model serving frameworks (... 
    Work experience placement
    3 days per week

    d-Matrix

    Santa Clara, CA
    3 days ago
  •  ...experiences-from AI and data centers,...  ...gaming and embedded systems. Grounded in a culture...  ...Staff AI Infra Engineer who is passionate...  ...intersection of hardware and software to optimize...  ...LLM training and inference on AMD GPUs,...  ...ML infrastructure, distributed systems, or performance... 

    Advanced Micro Devices , Inc.

    Santa Clara, CA
    1 day ago
  • $142.8k - $274.8k

     ...enterprises. Ourconverged AI fabricdelivers inference capabilities for all LLMs inMicrosoft...  ..., Llama, and more. As a Principal Software Engineer , you will shape the...  ...and shipping core serving systems, smart routing, and request distribution for a broad portfolio of LLMs... 
    Ongoing contract
    Local area

    Microsoft Corporation

    Mountain View, CA
    1 day ago
  • $248.71k - $292.6k

     ...Groq Groq delivers fast, efficient AI inference. Our LPU-based system powers GroqCloud™, giving...  ...is possible. Build fast. Sr. Staff Software Engineer - High Performance GPU Inference Systems...  ...Responsibilities & opportunities in this role Distributed Systems Engineering : Design and... 

    I did my part and supported the Regular Toilet

    Palo Alto, CA
    5 days ago
  • $172k - $349k

     ...Principal Software Engineer, Systems/Solutions Test This role has been designed as ‘Hybrid’ with an expectation...  .... Champion adoption of AI-assisted testing workflows, including...  ...Demonstrated excellence in debugging complex distributed/network failures and driving closure... 
    Work experience placement
    Work at office
    Local area
    Immediate start
    2 days per week

    HPE

    Sunnyvale, CA
    8 hours ago
  •  ...cryptography, encryption, and confidential AI solutions. As data breaches...  ...Requirements We’re looking for a  Staff Software Engineer to join our Confidential Computing...  ...core platform services powering secure, distributed systems at scale. This is a  high-impact,... 
    H1b
    Worldwide

    Fortanix

    Santa Clara, CA
    4 days ago
  • NVIDIA Gruppe is seeking a Senior System Software Engineer in Santa Clara, California, to develop world-class GPU-accelerated AI inference serving software. This role involves contributing...  ...skills, and a strong understanding of distributed systems. The position offers a... 

    NVIDIA Gruppe

    Santa Clara, CA
    5 days ago
  • $120.1k - $225.7k

     ...Entails End-to-End Inference Optimization: Lead...  ...and load imbalance in distributed inference....  ...members to build a robust AI inference technical ecosystem...  ...Science, Electronic Engineering, AI, or related fields...  ...Intelligent Routing . Systems Proficiency: Expert... 
    Relocation package

    Tencent

    Palo Alto, CA
    3 days ago
  • $226k - $369k

     ...part of our world-class software engineering team, you will take...  ..., best-in-class AI/ML infrastructure, Kubernetes...  ...use your passion for distributed technologies and...  ...algorithms, API design and systems design, and your...  ...our company. As a Principal Staff Software Engineer... 
    For contractors
    Work at office
    Flexible hours

    LinkedIn

    Sunnyvale, CA
    4 days ago
  • $215k - $250k

     ...Onehouse Data Infrastructure Engineer Onehouse is a mission-...  ...traditional analytics to real-time AI / ML). We are a team of...  ...created large-scale data systems and globally distributed platforms that sit at the...  ...tech stack by building the software and data features that... 
    Odd job
    Work at office
    Local area
    Remote work
    Relocation
    Relocation package

    OneHouse LLC

    Sunnyvale, CA
    4 days ago
  • $184k - $287.5k

    NVIDIA Gruppe is seeking talented AI systems engineers to advance innovative technologies in AI inference systems software. This role involves developing cutting-edge libraries, code generators, and kernel technologies for NVIDIA's architecture, emphasizing high-impact... 

    NVIDIA Gruppe

    Santa Clara, CA
    5 days ago
  • $212.8k

     ...Senior AI Infra Engineer - Large Model Inference Systems (Multimodal/LLM/VLM) Location: San Jose Employment Type...  ...high-performance foundation for distributed serving, heterogeneous scheduling...  ...or above in Computer Science, Software Engineering, Artificial Intelligence... 
    Temporary work
    Local area

    Tik Tok

    San Jose, CA
    2 days ago
  •  ...experiences-from AI and data centers,...  ...gaming and embedded systems. Grounded in a culture...  ...and Multimodal inference at scale across...  ...across internal GPU software teams and engage with...  ...Skilled engineer with strong technical...  ...training. ~ Distributed System Optimization... 

    Advanced Micro Devices , Inc.

    Santa Clara, CA
    3 days ago
  • $152k - $204k

     ...Senior Software Engineer, Inference Sunnyvale, CA / Bellevue, WA CoreWeave is The Essential Cloud for AI™. Built for pioneers by pioneers, CoreWeave delivers a platform of...  ...8 years industry experience building distributed systems or cloud services. ~ Strong coding... 
    Permanent employment
    Temporary work
    Casual work
    Work at office
    Flexible hours
    Shift work

    CoreWeave

    Sunnyvale, CA
    4 days ago
  •  ...experiences-from AI and data centers,...  ...gaming and embedded systems. Grounded in a culture...  ...member of the LLM inference framework team,...  ...single-node and distributed inference runtimes...  ...intersection of inference engines, distributed...  ...Software Engineering ~ Expertise... 

    Advanced Micro Devices , Inc.

    Santa Clara, CA
    3 days ago
  • $139k - $204k

     ...Senior Software Engineer I, Inference Sunnyvale, CA / Bellevue, WA CoreWeave is The Essential Cloud for AI™. Built for pioneers by pioneers, CoreWeave delivers a platform...  ...years industry experience building distributed systems or cloud services. Computer Science... 
    Permanent employment
    Temporary work
    Casual work
    Work at office
    Remote work
    Flexible hours
    Shift work

    CoreWeave

    Sunnyvale, CA
    4 days ago
  • $100k

     ...Software Engineer, TT-Distributed Tenstorrent is leading the industry on cutting-edge AI technology, revolutionizing performance expectations...  ...optimize distributed software systems that power the most...  ...state-of-the-art distributed inference and training infrastructure... 
    Permanent employment

    Tenstorrent

    Santa Clara, CA
    4 days ago
  •  ...Distributed Software Engineer Bengaluru, Karnataka, India; Sunnyvale CA or Toronto Canada Cerebras Systems builds the world's largest AI chip, 56 times larger than GPUs. Our novel wafer-scale architecture...  ...-leading training and inference speeds and empowers machine... 

    CEREBRAS SYSTEMS INC.

    Sunnyvale, CA
    4 days ago
  • $184k - $356.5k

    NVIDIA Gruppe is looking for skilled software engineers to develop AI inference systems that operate with high efficiency. The role involves architecting high-performance inference frameworks and optimizing GPU processes. Ideal candidates should have extensive programming... 

    NVIDIA Gruppe

    Santa Clara, CA
    5 days ago
  • $152k - $241.5k

     ...platform upon which every new AI‑powered application is...  .... We are seeking a Senior Software Engineer - AI Inference to advance open‑source LLM...  ...‑class on NVIDIA GPUs and systems-and by improving the underlying...  ...mindset. Familiarity with distributed systems concepts and... 

    NVIDIA Gruppe

    Santa Clara, CA
    5 days ago
  • $188k - $275k

     ...Staff Software Engineer, Inference CoreWeave is The Essential Cloud for AI™. Built for pioneers by pioneers, CoreWeave delivers...  ...GPU resource management, and system-wide optimizations that drive...  ...scale. You will work deeply in distributed systems and Kubernetes-based... 
    Permanent employment
    Temporary work
    Casual work
    Work at office
    Flexible hours

    CoreWeave

    Sunnyvale, CA
    8 hours ago
  •  ...performance degrades or systems fail, the impact is...  ...that using agentic AI. As a Principal Engineer in Performance and...  ...adopt Optimize LLM inference at scale through prompt...  ...and operating distributed systems at scale Proven...  ...systems, software engineering, or related... 
    Full time
    Temporary work
    Part time
    Local area
    Immediate start
    Home office
    Flexible hours

    Walmart

    Sunnyvale, CA
    4 days ago
  •  ...experiences—from AI and data...  ..., and embedded systems. Grounded in a...  ...THE ROLE: As a Principal AI Infrastructure Solution Engineer, you will partner...  ...with AMD’s AI software teams and customers...  ...LLM training and inference on AMD Instinct...  ...‑native distributed training, including... 

    Advanced Micro Devices

    Santa Clara, CA
    1 day ago
  • We are looking for a Senior System Software Engineer to work on Dynamo-Triton Inference Server. NVIDIA is hiring software engineers...  ...GPUs to power a revolution in AI, enabling breakthroughs in...  ...design. Experience with high‑scale distributed systems and ML systems. Strong... 

    NVIDIA Gruppe

    Santa Clara, CA
    5 days ago
  • $152k - $241.5k

     ...individual to optimize and benchmark GenAI inference using the latest acceleration...  ...industry benchmark results and architecting distributed inference systems. Required qualifications include a relevant degree and significant software development experience in Python or... 

    NVIDIA Gruppe

    Santa Clara, CA
    5 days ago

Do you want to receive more vacancies?

Subscribe and receive similar vacancies to Principal Software Engineer (AI Inference / Distributed Systems). Be the first to apply!