Principal Software Engineer
Advanced Micro Devices , Inc.
WHAT YOU DO AT AMD CHANGES EVERYTHING
At AMD, our mission is to build great products that accelerate next-generation computing experiences-from AI and data centers, to PCs, gaming and embedded systems. Grounded in a culture of innovation and collaboration, we believe real progress comes from bold ideas, human ingenuity and a shared passion to create something extraordinary. When you join AMD, you'll discover the real differentiator is our culture. We push the limits of innovation to solve the world's most important challenges-striving for execution excellence, while being direct, humble, collaborative, and inclusive of diverse perspectives. Join us as we shape the future of AI and beyond. Together, we advance your career. THE ROLE: As a Principal AI Infrastructure Solution Engineer, you will partner with AMD's AI software teams and customers to enable large-scale LLM training and inference on AMD Instinct GPUs. You will design and validate production-ready Kubernetes architectures and translate inference frameworks such as vLLM and SGLang into deployable customer solutions. Your work will accelerate customer time-to-production and strengthen AMD's leadership in AI infrastructure. THE PERSON: You are a solution-oriented AI infrastructure engineer with strong expertise in GPU-accelerated computing and large-scale AI deployments. You excel at translating complex technologies into customer-ready solutions and delivering production-grade Kubernetes-based inference and training systems. You bring hands-on experience with Kubernetes-native distributed training, including scheduling, topology-aware GPU placement, and operating resilient, high-performance AI workloads at scale. KEY RESPONSBILITIES:
At AMD, our mission is to build great products that accelerate next-generation computing experiences-from AI and data centers, to PCs, gaming and embedded systems. Grounded in a culture of innovation and collaboration, we believe real progress comes from bold ideas, human ingenuity and a shared passion to create something extraordinary. When you join AMD, you'll discover the real differentiator is our culture. We push the limits of innovation to solve the world's most important challenges-striving for execution excellence, while being direct, humble, collaborative, and inclusive of diverse perspectives. Join us as we shape the future of AI and beyond. Together, we advance your career. THE ROLE: As a Principal AI Infrastructure Solution Engineer, you will partner with AMD's AI software teams and customers to enable large-scale LLM training and inference on AMD Instinct GPUs. You will design and validate production-ready Kubernetes architectures and translate inference frameworks such as vLLM and SGLang into deployable customer solutions. Your work will accelerate customer time-to-production and strengthen AMD's leadership in AI infrastructure. THE PERSON: You are a solution-oriented AI infrastructure engineer with strong expertise in GPU-accelerated computing and large-scale AI deployments. You excel at translating complex technologies into customer-ready solutions and delivering production-grade Kubernetes-based inference and training systems. You bring hands-on experience with Kubernetes-native distributed training, including scheduling, topology-aware GPU placement, and operating resilient, high-performance AI workloads at scale. KEY RESPONSBILITIES:
- Design and deliver reference architectures for LLM training and inference on AMD GPUs, from single-node to multi-datacenter deployments using Kubernetes and SLURM.
- Architect and validate Kubernetes-based distributed training stacks for large-scale LLM workloads on AMD GPUs.
- Define and implement gang scheduling and topology-aware GPU placement for multi-node training workloads.
- Enable Kubernetes-native training controllers including Kubeflow Training Operator, MPI Operator, Volcano, and Kueue.
- Partner with enterprise customers and cloud providers to deploy and optimize production AMD GPU clusters for distributed inference and multi-tenant workloads.
- Implement and validate GPU orchestration using Kubernetes GPU Operator, device plugins, metrics exporters, and SLURM controllers.
- Benchmark and optimize LLM inference frameworks (vLLM, SGLang) on AMD hardware, producing customer-ready performance playbooks.
- Develop repeatable benchmarks for Kubernetes-based distributed training, covering scaling efficiency, step time, communication, and checkpointing.
- Create tuning guides for RCCL/NCCL-equivalent communication, CPU/GPU affinity, interconnect utilization, and workload-specific optimizations.
- Serve as the feedback loop between customers and AMD engineering, translating requirements into validated performance improvements.
- Deployed and operated large-scale GPU clusters for production AI training and inference
- Deep expertise in Kubernetes GPU orchestration (operators, device plugins, scheduling, multi-tenancy, observability)
- Hands-on experience with distributed training on Kubernetes (Kubeflow, MPI Operator, Volcano, Kueue, Ray)
- Strong knowledge of gang scheduling, elastic jobs, quotas, priority, and shared GPU environments
- Tuned Kubernetes networking and storage for AI workloads (high-performance CNI, RDMA where applicable, scalable checkpointing)
- Implemented ML observability for training (GPU/comms metrics, step-time analysis, SLO-driven ops)
- Experience in AI/ML infrastructure, solution architecture, and production GPU deployments
- Proven success enabling customers through complex AI platform deployments and migrations
- Strong background working across engineering and customer-facing roles
- Understanding of AI accelerator architectures and inference optimization techniques
- Experience operationalizing Kubernetes-based distributed training at scale
- Open-source contributions or AI infrastructure community engagement (plus)
- Santa Clara, Ca or open to discuss other locations.
Vacancy posted 4 days ago
Similar jobs that could be interesting for youBased on the Principal Software Engineer in Santa Clara, CA vacancy
$170k - $277k
...that drives great outcomes. Job Summary The Team Engineering - Our engineering team is at the core of our products and connected... ...of waiting for directions. Job Summary As a Sr. Principal Software Engineer in the Engineering team, you will join a world-...SuggestedFull timeWork at office$147k - $237.5k
...that drives great outcomes. Job Summary Your Career The Cortex Xpanse group is growing, and we’re looking for a Principal Software Engineer to join our team. This team is at the forefront of identifying and mitigating external security risks by continuously discovering...SuggestedFull timeWork at officeVisa sponsorshipWork visa$195k - $290k
...worldwide. We are significantly expanding our AI and machine learning capabilities on the endpoint. We are seeking a principal engineer to help define our endpoint AI learning framework. The person who fills this role will help us deliver new security...SuggestedWork experience placementWork at officeLocal areaRemote workWorldwide2 days per week- ...Principal Engineer, Endpoint AI Learning Framework CrowdStrike's Sensor Security Platform team builds foundational security capabilities for Crowstrike's Falcon sensor, which runs on over 50 million endpoints worldwide. We are significantly expanding our AI and machine...SuggestedWork at officeWorldwide2 days per week
$165.8k - $307.9k
...Solutions, is responsible for ensuring a software product meets its specified... ...its development lifecycle. As a Principal Software Developer in Test, you will be... ...this role, you will represent quality engineering and verification on behalf of your team...SuggestedWork at officeLocal areaRelocation package$143k - $286k
...responsible to support our Conversational AI mobile engineering team in driving down costs to implementation,... ...developing and enhancing our enterprise scale software applications and frameworks. We are looking for a Principal Android Engineer to build SDKs which will be...Full timeTemporary workPart time$272k - $431.25k
...NVIDIA is seeking a strategic and technically proficient Principal Software Engineer to join the Data Center Systems and Software CSP engagements team. As a leader and technologist, you will play a pivotal role contributing significantly to the architecture and development...Shift work- ...Principal Software Engineer The Bank is the most sought-after financial partner in the global innovation economy. We bring together entrepreneurs, investors, venture capitalists, and private equity firms to move their bold ideas forward, fast. Our clients define what...Temporary work
$167k - $270.5k
...solving, stronger relationships, and the kind of precision that drives great outcomes. Job Summary Job Summary As a Principal Software Engineer to join our CPQ (Configure Price and Quote) team, you will serve as the recognized subject matter expert, bringing...Full timeWork at officeVisa sponsorshipWork visa3 days per week$143k - $286k
...the forefront of AI innovation, leveraging cutting-edge technologies to redefine customer experiences. We are seeking a Principal, Software Engineer with deep expertise in Generative AI , LLMs , and RAG frameworks to lead the design, development, and deployment...Full timeTemporary workPart time- ...Strategic Software Engineering Lead At AMD, our mission is to build great products that accelerate next-generation computing experiences—from AI and data centers, to PCs, gaming and embedded systems. Grounded in a culture of innovation and collaboration, we believe...
$248k - $391k
...groundbreaking technology that’s setting the standard for graphical processing, PC gaming, and AI computing. As a Principal Software Development Engineer specializing in Solid State Drives (SSD), you’ll help define accelerated storage technology! What You Will Be Doing...$143k - $286k
...Position Summary... What you'll do... Principal, Software Engineer We are seeking a talented and passionate Principal, Software Engineer to join our International Technology Organization. The ideal candidate will have experience as a tech lead and strong development...Full timeTemporary workPart timeWork at officeFlexible hours$212k - $386.3k
...Principal Software Engineer, Retail Foundations Apple is where individual imaginations gather together, committing to the values that lead to great work. Every new product we build, service we create, or Apple Store experience we deliver is the result of us making each...Work experience placementRelocation$170k - $277k
...stronger relationships, and the kind of precision that drives great outcomes. Job Summary We are seeking an experienced Software Engineer to contribute to the design, development, and delivery of next-generation technologies within our GlobalProtect team. We are...Full timeWork at office$272k - $431.25k
...any single GPU, this platform enables efficient, resilient deployment of cutting-edge LLM workloads. We are seeking a Principal Systems Engineer to define the vision and roadmap for memory management of large-scale LLM and storage systems. What you'll be doing:...Local areaRemote work$170k - $210k
...Principal Software Engineer Step into a high-impact Principal Software Engineer opportunity with a confidential client, where you will help drive meaningful results across Banking / Lending/ Financial Services,Information Technology. This role offers the chance...- ...Principal / Senior Software Engineer Location: Santa Clara, CA Duration: Full-time/Perm We are looking for a senior/principal Software Engineer with hands-on experience with x86 low level programming including device drivers, boot-up sequence, and BIOS. The ideal...Permanent employmentFull time
$170k - $277k
...that drives great outcomes. Job Summary The Team Engineering - Our engineering team is at the core of our products and connected... ...secure digital environment. Job Description As a Sr Principal Software Engineer within the Engineering team, you will drive the...Full timeWork at office- ...Principal, Software Engineer Join Walmart as a Principal Software Engineer for the Colony Platform within our AI & Data organization and help make it trivially easy for associates — engineers, data scientists, and builders— to go from idea to AI-based solutions quickly...Contract workTemporary workLocal area
- ...Principal Software Engineer Come join a growing bank at the heart of the innovation, technology, green tech and life sciences space. We continue to expand our global footprint and our banking technology is at the core of everything we do. Work within our DevOps team...Shift work
$170k - $210k
...security. At Fortinet, our mission is to safeguard people, devices, and data everywhere. We are currently seeking a Principal Software Developer Engineer for our FortiSwitch team. As a Principal Software Developer Engineer, you will: Develop and maintain software...Full timeWorldwideHome office$272k - $431.25k
...NVIDIA is the platform for every new AI-powered application. We seek a Principal Software Engineer - AI Inference to advance open-source LLM serving. This role involves contributing to upstream inference engines like vLLM and SGLang. You will ensure they run outstandingly...Remote work$175k - $245k
...business requirements. Collaborate with our hardware team to support the delivery of our new platform. Maintain the existing software components, OS related. Requirements: B.S./M.S. with 8+ years of relevant experience. Hands-on experience with the Linux...Full timeWorldwide$272k - $431.25k
...technical leader to design, drive, and operationalize firmware and software architecture and design as well as collaborate with HW Design... ...the way through product production. Mentor architects and engineering teams to grow them into future leaders. Make key technical...Shift work$126k - $204.5k
...constantly innovating and challenging the way we, and the whole industry, think about cybersecurity. Your Career As a Principal Software Engineer, you will play a key role in the design and implementation of our Threat Intelligence Services for public and private...Full timeTemporary workWork at office- ...deployment for remote networks and mobile users. As a Senior Engineer, your role will involve building and designing distributed... ...Analyze requirements, design, develop, and support highly scalable software features and infrastructure on our next-generation security platform...Full timeCasual workWork at officeRemote workVisa sponsorshipWork visa3 days per week
$272k - $431.25k
...will lead the architecture and hands-on delivery across system software, drivers, and CUDA to make profiling continuously available... ...signals into actionable insights. Set technical direction for an engineering team; mentor engineers, drive technical planning to mitigate...$147k - $237.5k
...the products and services that proactively address them. Our engineering team is at the core of our products – connected directly to the... ...remote networks and mobile users. We are seeking an experienced Software Engineer to design, develop and deliver next-generation...Full timeWork at officeRemote work$143k - $286k
...generation content. What you'll do: Guide and mentor, a team of engineers, conducting code reviews and leading design discussions to... ...goals and scalability requirements.?? Architect complex software systems, ensuring performance, security, and scalability needs...Full timeTemporary workPart time
Do you want to receive more vacancies?
Subscribe and receive similar vacancies to Principal Software Engineer. Be the first to apply!
Related searches
- senior principal software engineer Santa Clara, CA
- principal software engineer Santa Clara, CA
- principal Santa Clara, CA
- senior principal cloud computing engineer Santa Clara, CA
- principal architect Santa Clara, CA
- principal data scientist Santa Clara, CA
- principal cloud computing engineer Santa Clara, CA
- senior principal scientist Santa Clara, CA
- id software Santa Clara, CA
- software quality assurance Santa Clara, CA

