Principal Software Engineer (AI Inference / Distributed Systems)
Advanced Micro Devices , Inc.
WHAT YOU DO AT AMD CHANGES EVERYTHING
At AMD, our mission is to build great products that accelerate next-generation computing experiences-from AI and data centers, to PCs, gaming and embedded systems. Grounded in a culture of innovation and collaboration, we believe real progress comes from bold ideas, human ingenuity and a shared passion to create something extraordinary. When you join AMD, you'll discover the real differentiator is our culture. We push the limits of innovation to solve the world's most important challenges-striving for execution excellence, while being direct, humble, collaborative, and inclusive of diverse perspectives. Join us as we shape the future of AI and beyond. Together, we advance your career.
THE ROLE:
AMD is looking for a strategic software engineering lead who is passionate about improving the performance of key applications and benchmarks . You will be a member of a core team of incredibly talented industry specialists and will work with the very latest hardware and software technology.
THE PERSON:
The ideal candidate should be passionate about software engineering and possess leadership skills to drive sophisticated issues to resolution. Able to communicate effectively and work optimally with different teams across AMD.
KEY RESPONSIBILITIES:
- Develop techniques for optimizing scale-up and scale-out inference.
- Develop methods and tooling to utilize dynamic resources in service of inference
- Support proliferation of rocm ecosystem.
PREFERRED EXPERIENCE:
- Expertise in the K8s ecosystem, especially as it pertains to large scale inference
- Operational experience with at least one of sglang, or vllm and with kserve, llm-d. Experience running inference as a service can be substituted in-lieu of experience with frameworks such as kserve or llm-d.
- Expertise with techniques used to optimize inference like distributed kv-cache, disaggregation, request scheduling etc
- Ability to write high quality code with a keen attention to detail. Preferred languages are go and python.
- Experience with modern concurrent programming
- Effective communicator with keen attention to detail.
- Prior experience roadmapping deeply technical areas is highly valuable.
ACADEMIC CREDENTIALS:
- Bachelor's or Master's degree in Computer Science, Computer Engineering, Electrical Engineering, or equivalent
This role is not eligible for visa sponsorship.
#LI-G11
#LI-HYBRID
Benefits offered are described: AMD benefits at a glance.
AMD does not accept unsolicited resumes from headhunters, recruitment agencies, or fee-based recruitment services. AMD and its subsidiaries are equal opportunity, inclusive employers and will consider all applicants without regard to age, ancestry, color, marital status, medical condition, mental or physical disability, national origin, race, religion, political and/or third-party affiliation, sex, pregnancy, sexual orientation, gender identity, military or veteran status, or any other characteristic protected by law. We encourage applications from all qualified candidates and will accommodate applicants' needs under the respective laws throughout all stages of the recruitment and selection process.
AMD may use Artificial Intelligence to help screen, assess or select applicants for this position. AMD's "Responsible AI Policy" is available here.
This posting is for an existing vacancy.
$272k - $431.25k
...throughput, low-latency inference framework for serving generative AI and reasoning models across multi-node distributed environments. Built in... ...accelerators feel like a single system at datacenter scale. As... .... We are seeking a Principal Systems Engineer to define the vision...SuggestedLocal areaRemote work$272k - $431.25k
...the platform for every new AI-powered application. We seek a Principal Software Engineer - AI Inference to advance open-source LLM serving... ...on NVIDIA GPUs and systems. You will also strengthen the... ...performance engineering, and distributed systems. You will collaborate...SuggestedRemote work$168k - $270.25k
...Senior Engineer For Factory Infrastructure... ...which every new AI-powered... ...automation for NVIDIA Inference Microservices (NIMs... ...hardware and software environments. You... ...skills to build distributed and compute systems, backend services... ...functional teams, principals and architects,...Suggested$184k - $287.5k
...We are seeking highly skilled and motivated software engineers to join us and build AI inference systems that serve large-scale models with extreme efficiency... ...systems, computer architecture, parallel programming, distributed systems, deep learning theories. Knowledgeable...Suggested$140k - $240k
...Cerebras Systems builds the world's largest AI chip, 56 times larger than... ...leading training and inference speeds and... ...security-first based engineering. Cerebras cluster... ...cluster management software stack - all the way... ...management role in distributed systems security....Suggested- ...Principal AI/ML System Software Engineer At d-Matrix, we are focused on unleashing the potential of generative... ...tools Experience with distributed, high-performance software design... ...related fields Experience with inference servers/model serving frameworks (...Work experience placement3 days per week
$139.9k - $274.8k
...enterprises. Ourconverged AI fabricdelivers inference capabilities for all LLMs inMicrosoft... ..., Llama, and more. As a Principal Software Engineer , you will shape the... ...and shipping core serving systems, smart routing, and request distribution for a broad portfolio of LLMs...Ongoing contractLocal area$181.1k - $318.4k
...Sr. Software Engineer (Distributed System) Work Locations (2) Submit Resume At Apple, the information powering Siri, Spotlight, Apple Maps, and Apple... ...just build distributed systems but who leverages modern AI coding tools as a core part of their daily engineering...Relocation$172k - $349k
...Principal Software Engineer, Systems/Solutions Test This role has been designed as ‘Hybrid’ with an expectation... .... Champion adoption of AI-assisted testing workflows, including... ...Demonstrated excellence in debugging complex distributed/network failures and driving closure...Work experience placementWork at officeLocal areaImmediate start2 days per week- ...cryptography, encryption, and confidential AI solutions. As data breaches... ...Requirements We’re looking for a Staff Software Engineer to join our Confidential Computing... ...core platform services powering secure, distributed systems at scale. This is a high-impact,...H1bWorldwide
$272k - $431.25k
...Principal Rack Scale Systems Infrastructure Engineer NVIDIA has been transforming computer graphics... ...unlimited potential of AI to define the next era of... ...the development of software systems. These systems support... ..., system software, distributed systems, infrastructure...Shift work$226k - $369k
...part of our world-class software engineering team, you will take... ..., best-in-class AI/ML infrastructure, Kubernetes... ...use your passion for distributed technologies and... ...algorithms, API design and systems design, and your... ...our company. As a Principal Staff Software Engineer...For contractorsWork at officeFlexible hours$215k - $250k
...Onehouse Data Infrastructure Engineer Onehouse is a mission-... ...traditional analytics to real-time AI / ML). We are a team of... ...created large-scale data systems and globally distributed platforms that sit at the... ...tech stack by building the software and data features that...Odd jobWork at officeLocal areaRemote workRelocationRelocation package$272k - $431.25k
...NVIDIA is seeking a highly motivated Principal System Software Engineer to drive next-generation innovations... ...hardware, architecture, kernel, AI, middleware, and platform teams to deliver... ...debugging and optimizing complex distributed or heterogeneous computing systems....$156k - $387.6k
...Senior Software Development Engineer - Distributed KV Caching and Storage Systems Location: San Jose Team: Infrastructure Employment Type: Regular Job Code:... ...improvements using ZNS SSD, io_uring, RDMA/CXL, and "AI+DB" directions in production. Qualifications...Temporary workLocal area$165k - $242k
...Senior Software Engineer II, Inference Sunnyvale, CA / Bellevue, WA CoreWeave is The Essential Cloud for AI™. Built for pioneers by pioneers, CoreWeave delivers a platform... ...years industry experience building distributed systems or cloud services. ~ Strong coding...Permanent employmentTemporary workCasual workWork at officeRemote workFlexible hoursShift work$152k - $241.5k
...platform upon which every new AI‑powered application is... .... We are seeking a Senior Software Engineer – AI Inference to advance open‑source LLM... ...‑class on NVIDIA GPUs and systems-and by improving the underlying... .... ~ Familiarity with distributed systems concepts and...- ...Distributed Software Engineer Cerebras Systems builds the world's largest AI chip, 56 times larger than GPUs. Our novel wafer-scale architecture provides the AI compute... ...to deliver industry-leading training and inference speeds and empowers machine learning users to...
- ...experiences-from AI and data centers,... ...gaming and embedded systems. Grounded in a culture... ...and Multimodal inference at scale across... ...across internal GPU software teams and engage with... ...Skilled engineer with strong technical... ...training. ~ Distributed System Optimization...
$100k
...Software Engineer, TT-Distributed Tenstorrent is leading the industry on cutting-edge AI technology, revolutionizing performance expectations... ...optimize distributed software systems that power the most... ...state-of-the-art distributed inference and training infrastructure...$139k - $204k
...Senior Software Engineer I, Inference Sunnyvale, CA / Bellevue, WA CoreWeave is The Essential Cloud for AI™. Built for pioneers by pioneers, CoreWeave delivers a platform... ...years industry experience building distributed systems or cloud services. Computer Science...Permanent employmentTemporary workCasual workWork at officeRemote workFlexible hoursShift work$272k - $431.25k
...efficient, scalable inference for large language and... ...reasoning models in distributed GPU environments. By... ...high-performance AI inference for demanding... ...we’re searching for engineers enthusiastic about... ...generation of scalable AI systems. As a Principal Software Engineer on the...- ...experiences-from AI and data... ...gaming and embedded systems. Grounded in a... ...ROLE: As a Principal AI Infrastructure Solution Engineer, you will partner... ...with AMD's AI software teams and... ...LLM training and inference on AMD Instinct... ...Kubernetes-native distributed training, including...
$152k - $241.5k
...We are looking for a Senior System Software Engineer to work on Dynamo-Triton Inference Server ( . NVIDIA is hiring software... ...using GPUs to power a revolution in AI, enabling breakthroughs in... ...~ Experience with high-scale distributed systems and ML systems. ~ Strong...- ...Distributed Systems Software Engineer, Python / GoJoin to apply for the Distributed Systems Software Engineer, Python / Go role at CanonicalContinue with... ...deployment capabilities to new clouds and developing AI/ML pipelines for automatic analysis of test results. A successful...Local areaRemote workWorldwide
$272k - $431.25k
...unlimited potential of AI to define the next era of... ...MODS organization seeks a Principal Engineer to architect and scale... ...L10 and L11 diagnostic systems for Cloud Service Providers... .... Proficiency in distributed systems and hardware / software interfaces is essential...$160.36k - $240.54k
...Senior Software Engineer, Distributed Compute System Mountain View, California (HQ) Who We Are Nuro is a self-driving technology company on a mission... ...world's most scalable driver, combining cutting-edge AI with automotive-grade hardware. Nuro licenses its core...- ...performance degrades or systems fail, the impact is... ...that using agentic AI. As a Principal Engineer in Performance and... ...adopt Optimize LLM inference at scale through prompt... ...and operating distributed systems at scale Proven... ...systems, software engineering, or related...Full timeTemporary workPart timeLocal areaImmediate startHome officeFlexible hours
$126.8k - $220.9k
...Software Engineer - Distributed Build Systems Work Locations (2) Submit Resume Apple's distributed build platform is central to the development and delivery... ..., monitoring, or SRE practices Leveraging AI-assisted development tools to improve personal and team...Relocation$272k - $431.25k
...We are hiring senior engineers to work on the CUDA driver, a core component... ...model across a range of system configurations and hardware capabilities... ...15+ years of relevant systems software development experience ~... ...vacancy. NVIDIA uses AI tools in its recruiting processes...
Do you want to receive more vacancies?
Subscribe and receive similar vacancies to Principal Software Engineer (AI Inference / Distributed Systems). Be the first to apply!
- senior principal software engineer Santa Clara, CA
- principal software engineer Santa Clara, CA
- operations support system engineer Santa Clara, CA
- mission system engineer Santa Clara, CA
- unix linux systems engineer Santa Clara, CA
- space systems engineer Santa Clara, CA
- director systems engineering Santa Clara, CA
- digital communications systems engineer Santa Clara, CA
- application system engineer Santa Clara, CA
- computer systems engineer Santa Clara, CA

