Sign up to access all features of our service.
  • Job search
  • Favorites
  • Create a CV
    New
  • Salaries
  • Subscriptions

Principal Software Engineer (AI Inference / Distributed Systems)

Advanced Micro Devices , Inc.

WHAT YOU DO AT AMD CHANGES EVERYTHING

At AMD, our mission is to build great products that accelerate next-generation computing experiences-from AI and data centers, to PCs, gaming and embedded systems. Grounded in a culture of innovation and collaboration, we believe real progress comes from bold ideas, human ingenuity and a shared passion to create something extraordinary. When you join AMD, you'll discover the real differentiator is our culture. We push the limits of innovation to solve the world's most important challenges-striving for execution excellence, while being direct, humble, collaborative, and inclusive of diverse perspectives. Join us as we shape the future of AI and beyond. Together, we advance your career.

THE ROLE:

AMD is looking for a strategic software engineering lead who is passionate about improving the performance of key applications and benchmarks . You will be a member of a core team of incredibly talented industry specialists and will work with the very latest hardware and software technology.

THE PERSON:

The ideal candidate should be passionate about software engineering and possess leadership skills to drive sophisticated issues to resolution. Able to communicate effectively and work optimally with different teams across AMD.

KEY RESPONSIBILITIES:

  • Develop techniques for optimizing scale-up and scale-out inference.
  • Develop methods and tooling to utilize dynamic resources in service of inference
  • Support proliferation of rocm ecosystem.

PREFERRED EXPERIENCE:

  • Expertise in the K8s ecosystem, especially as it pertains to large scale inference
  • Operational experience with at least one of sglang, or vllm and with kserve, llm-d. Experience running inference as a service can be substituted in-lieu of experience with frameworks such as kserve or llm-d.
  • Expertise with techniques used to optimize inference like distributed kv-cache, disaggregation, request scheduling etc
  • Ability to write high quality code with a keen attention to detail. Preferred languages are go and python.
  • Experience with modern concurrent programming
  • Effective communicator with keen attention to detail.
  • Prior experience roadmapping deeply technical areas is highly valuable.

ACADEMIC CREDENTIALS:

  • Bachelor's or Master's degree in Computer Science, Computer Engineering, Electrical Engineering, or equivalent

This role is not eligible for visa sponsorship.

#LI-G11

#LI-HYBRID

Benefits offered are described: AMD benefits at a glance.

AMD does not accept unsolicited resumes from headhunters, recruitment agencies, or fee-based recruitment services. AMD and its subsidiaries are equal opportunity, inclusive employers and will consider all applicants without regard to age, ancestry, color, marital status, medical condition, mental or physical disability, national origin, race, religion, political and/or third-party affiliation, sex, pregnancy, sexual orientation, gender identity, military or veteran status, or any other characteristic protected by law. We encourage applications from all qualified candidates and will accommodate applicants' needs under the respective laws throughout all stages of the recruitment and selection process.

AMD may use Artificial Intelligence to help screen, assess or select applicants for this position. AMD's "Responsible AI Policy" is available here.

This posting is for an existing vacancy.

Vacancy posted 1 day ago
Similar jobs that could be interesting for youBased on the Principal Software Engineer (AI Inference / Distributed Systems) in Santa Clara, CA vacancy
  • $272k - $431.25k

     ...throughput, low-latency inference framework for serving generative AI and reasoning models across multi-node distributed environments. Built in...  ...accelerators feel like a single system at datacenter scale. As...  .... We are seeking a Principal Systems Engineer to define the vision... 
    Suggested
    Local area
    Remote work

    NVIDIA

    Santa Clara, CA
    3 days ago
  • $272k - $431.25k

     ...the platform for every new AI-powered application. We seek a Principal Software Engineer - AI Inference to advance open-source LLM serving...  ...on NVIDIA GPUs and systems. You will also strengthen the...  ...performance engineering, and distributed systems. You will collaborate... 
    Suggested
    Remote work

    NVIDIA

    Santa Clara, CA
    2 days ago
  • $168k - $270.25k

     ...Senior Engineer For Factory Infrastructure...  ...which every new AI-powered...  ...automation for NVIDIA Inference Microservices (NIMs...  ...hardware and software environments. You...  ...skills to build distributed and compute systems, backend services...  ...functional teams, principals and architects,... 
    Suggested

    NVIDIA

    Santa Clara, CA
    3 days ago
  • $184k - $287.5k

     ...We are seeking highly skilled and motivated software engineers to join us and build AI inference systems that serve large-scale models with extreme efficiency...  ...systems, computer architecture, parallel programming, distributed systems, deep learning theories. Knowledgeable... 
    Suggested

    NVIDIA

    Santa Clara, CA
    3 days ago
  • $140k - $240k

     ...Cerebras Systems builds the world's largest AI chip, 56 times larger than...  ...leading training and inference speeds and...  ...security-first based engineering. Cerebras cluster...  ...cluster management software stack - all the way...  ...management role in distributed systems security.... 
    Suggested

    CEREBRAS SYSTEMS INC.

    Sunnyvale, CA
    2 days ago
  •  ...Principal AI/ML System Software Engineer At d-Matrix, we are focused on unleashing the potential of generative...  ...tools Experience with distributed, high-performance software design...  ...related fields Experience with inference servers/model serving frameworks (... 
    Work experience placement
    3 days per week

    d-Matrix

    Santa Clara, CA
    2 days ago
  • $139.9k - $274.8k

     ...enterprises. Ourconverged AI fabricdelivers inference capabilities for all LLMs inMicrosoft...  ..., Llama, and more. As a Principal Software Engineer , you will shape the...  ...and shipping core serving systems, smart routing, and request distribution for a broad portfolio of LLMs... 
    Ongoing contract
    Local area

    Microsoft Corporation

    Mountain View, CA
    7 hours ago
  • $181.1k - $318.4k

     ...Sr. Software Engineer (Distributed System) Work Locations (2) Submit Resume At Apple, the information powering Siri, Spotlight, Apple Maps, and Apple...  ...just build distributed systems but who leverages modern AI coding tools as a core part of their daily engineering... 
    Relocation

    Apple

    Santa Clara, CA
    2 days ago
  • $172k - $349k

     ...Principal Software Engineer, Systems/Solutions Test This role has been designed as ‘Hybrid’ with an expectation...  .... Champion adoption of AI-assisted testing workflows, including...  ...Demonstrated excellence in debugging complex distributed/network failures and driving closure... 
    Work experience placement
    Work at office
    Local area
    Immediate start
    2 days per week

    HPE

    Sunnyvale, CA
    4 days ago
  •  ...cryptography, encryption, and confidential AI solutions. As data breaches...  ...Requirements We’re looking for a  Staff Software Engineer to join our Confidential Computing...  ...core platform services powering secure, distributed systems at scale. This is a  high-impact,... 
    H1b
    Worldwide

    Fortanix

    Santa Clara, CA
    3 days ago
  • $272k - $431.25k

     ...Principal Rack Scale Systems Infrastructure Engineer NVIDIA has been transforming computer graphics...  ...unlimited potential of AI to define the next era of...  ...the development of software systems. These systems support...  ..., system software, distributed systems, infrastructure... 
    Shift work

    NVIDIA

    Santa Clara, CA
    3 days ago
  • $226k - $369k

     ...part of our world-class software engineering team, you will take...  ..., best-in-class AI/ML infrastructure, Kubernetes...  ...use your passion for distributed technologies and...  ...algorithms, API design and systems design, and your...  ...our company. As a Principal Staff Software Engineer... 
    For contractors
    Work at office
    Flexible hours

    LinkedIn

    Sunnyvale, CA
    3 days ago
  • $215k - $250k

     ...Onehouse Data Infrastructure Engineer Onehouse is a mission-...  ...traditional analytics to real-time AI / ML). We are a team of...  ...created large-scale data systems and globally distributed platforms that sit at the...  ...tech stack by building the software and data features that... 
    Odd job
    Work at office
    Local area
    Remote work
    Relocation
    Relocation package

    OneHouse LLC

    Sunnyvale, CA
    3 days ago
  • $272k - $431.25k

     ...NVIDIA is seeking a highly motivated Principal System Software Engineer to drive next-generation innovations...  ...hardware, architecture, kernel, AI, middleware, and platform teams to deliver...  ...debugging and optimizing complex distributed or heterogeneous computing systems.... 

    NVIDIA

    Santa Clara, CA
    14 hours ago
  • $156k - $387.6k

     ...Senior Software Development Engineer - Distributed KV Caching and Storage Systems Location: San Jose Team: Infrastructure Employment Type: Regular Job Code:...  ...improvements using ZNS SSD, io_uring, RDMA/CXL, and "AI+DB" directions in production. Qualifications... 
    Temporary work
    Local area

    ByteDance

    San Jose, CA
    7 hours ago
  • $165k - $242k

     ...Senior Software Engineer II, Inference Sunnyvale, CA / Bellevue, WA CoreWeave is The Essential Cloud for AI™. Built for pioneers by pioneers, CoreWeave delivers a platform...  ...years industry experience building distributed systems or cloud services. ~ Strong coding... 
    Permanent employment
    Temporary work
    Casual work
    Work at office
    Remote work
    Flexible hours
    Shift work

    CoreWeave

    Sunnyvale, CA
    3 days ago
  • $152k - $241.5k

     ...platform upon which every new AI‑powered application is...  .... We are seeking a Senior Software Engineer – AI Inference to advance open‑source LLM...  ...‑class on NVIDIA GPUs and systems-and by improving the underlying...  .... ~ Familiarity with distributed systems concepts and... 

    NVIDIA

    Santa Clara, CA
    3 days ago
  •  ...Distributed Software Engineer Cerebras Systems builds the world's largest AI chip, 56 times larger than GPUs. Our novel wafer-scale architecture provides the AI compute...  ...to deliver industry-leading training and inference speeds and empowers machine learning users to... 

    CEREBRAS SYSTEMS INC.

    Sunnyvale, CA
    2 days ago
  •  ...experiences-from AI and data centers,...  ...gaming and embedded systems. Grounded in a culture...  ...and Multimodal inference at scale across...  ...across internal GPU software teams and engage with...  ...Skilled engineer with strong technical...  ...training. ~ Distributed System Optimization... 

    Advanced Micro Devices , Inc.

    Santa Clara, CA
    2 days ago
  • $100k

     ...Software Engineer, TT-Distributed Tenstorrent is leading the industry on cutting-edge AI technology, revolutionizing performance expectations...  ...optimize distributed software systems that power the most...  ...state-of-the-art distributed inference and training infrastructure... 

    Tenstorrent

    Santa Clara, CA
    3 days ago
  • $139k - $204k

     ...Senior Software Engineer I, Inference Sunnyvale, CA / Bellevue, WA CoreWeave is The Essential Cloud for AI™. Built for pioneers by pioneers, CoreWeave delivers a platform...  ...years industry experience building distributed systems or cloud services. Computer Science... 
    Permanent employment
    Temporary work
    Casual work
    Work at office
    Remote work
    Flexible hours
    Shift work

    CoreWeave

    Sunnyvale, CA
    3 days ago
  • $272k - $431.25k

     ...efficient, scalable inference for large language and...  ...reasoning models in distributed GPU environments. By...  ...high-performance AI inference for demanding...  ...we’re searching for engineers enthusiastic about...  ...generation of scalable AI systems. As a Principal Software Engineer on the... 

    NVIDIA

    Santa Clara, CA
    3 days ago
  •  ...experiences-from AI and data...  ...gaming and embedded systems. Grounded in a...  ...ROLE: As a Principal AI Infrastructure Solution Engineer, you will partner...  ...with AMD's AI software teams and...  ...LLM training and inference on AMD Instinct...  ...Kubernetes-native distributed training, including... 

    Advanced Micro Devices , Inc.

    Santa Clara, CA
    1 day ago
  • $152k - $241.5k

     ...We are looking for a Senior System Software Engineer to work on Dynamo-Triton Inference Server ( . NVIDIA is hiring software...  ...using GPUs to power a revolution in AI, enabling breakthroughs in...  ...~ Experience with high-scale distributed systems and ML systems. ~ Strong... 

    NVIDIA

    Santa Clara, CA
    3 days ago
  •  ...Distributed Systems Software Engineer, Python / GoJoin to apply for the Distributed Systems Software Engineer, Python / Go role at CanonicalContinue with...  ...deployment capabilities to new clouds and developing AI/ML pipelines for automatic analysis of test results. A successful... 
    Local area
    Remote work
    Worldwide

    Canonical

    San Jose, CA
    7 hours ago
  • $272k - $431.25k

     ...unlimited potential of AI to define the next era of...  ...MODS organization seeks a Principal Engineer to architect and scale...  ...L10 and L11 diagnostic systems for Cloud Service Providers...  .... Proficiency in distributed systems and hardware / software interfaces is essential... 

    NVIDIA

    Santa Clara, CA
    3 days ago
  • $160.36k - $240.54k

     ...Senior Software Engineer, Distributed Compute System Mountain View, California (HQ) Who We Are Nuro is a self-driving technology company on a mission...  ...world's most scalable driver, combining cutting-edge AI with automotive-grade hardware. Nuro licenses its core... 

    Nuro

    Mountain View, CA
    3 days ago
  •  ...performance degrades or systems fail, the impact is...  ...that using agentic AI. As a Principal Engineer in Performance and...  ...adopt Optimize LLM inference at scale through prompt...  ...and operating distributed systems at scale Proven...  ...systems, software engineering, or related... 
    Full time
    Temporary work
    Part time
    Local area
    Immediate start
    Home office
    Flexible hours

    Walmart

    Sunnyvale, CA
    3 days ago
  • $126.8k - $220.9k

     ...Software Engineer - Distributed Build Systems Work Locations (2) Submit Resume Apple's distributed build platform is central to the development and delivery...  ..., monitoring, or SRE practices Leveraging AI-assisted development tools to improve personal and team... 
    Relocation

    Apple

    Cupertino, CA
    2 days ago
  • $272k - $431.25k

     ...We are hiring senior engineers to work on the CUDA driver, a core component...  ...model across a range of system configurations and hardware capabilities...  ...15+ years of relevant systems software development experience ~...  ...vacancy. NVIDIA uses AI tools in its recruiting processes... 

    NVIDIA

    Santa Clara, CA
    7 hours ago

Do you want to receive more vacancies?

Subscribe and receive similar vacancies to Principal Software Engineer (AI Inference / Distributed Systems). Be the first to apply!