Principal Software Engineer
Advanced Micro Devices , Inc.
WHAT YOU DO AT AMD CHANGES EVERYTHING
At AMD, our mission is to build great products that accelerate next-generation computing experiences-from AI and data centers, to PCs, gaming and embedded systems. Grounded in a culture of innovation and collaboration, we believe real progress comes from bold ideas, human ingenuity and a shared passion to create something extraordinary. When you join AMD, you'll discover the real differentiator is our culture. We push the limits of innovation to solve the world's most important challenges-striving for execution excellence, while being direct, humble, collaborative, and inclusive of diverse perspectives. Join us as we shape the future of AI and beyond. Together, we advance your career. THE ROLE: As a Principal AI Infrastructure Solution Engineer, you will partner with AMD's AI software teams and customers to enable large-scale LLM training and inference on AMD Instinct GPUs. You will design and validate production-ready Kubernetes architectures and translate inference frameworks such as vLLM and SGLang into deployable customer solutions. Your work will accelerate customer time-to-production and strengthen AMD's leadership in AI infrastructure. THE PERSON: You are a solution-oriented AI infrastructure engineer with strong expertise in GPU-accelerated computing and large-scale AI deployments. You excel at translating complex technologies into customer-ready solutions and delivering production-grade Kubernetes-based inference and training systems. You bring hands-on experience with Kubernetes-native distributed training, including scheduling, topology-aware GPU placement, and operating resilient, high-performance AI workloads at scale. KEY RESPONSBILITIES:
At AMD, our mission is to build great products that accelerate next-generation computing experiences-from AI and data centers, to PCs, gaming and embedded systems. Grounded in a culture of innovation and collaboration, we believe real progress comes from bold ideas, human ingenuity and a shared passion to create something extraordinary. When you join AMD, you'll discover the real differentiator is our culture. We push the limits of innovation to solve the world's most important challenges-striving for execution excellence, while being direct, humble, collaborative, and inclusive of diverse perspectives. Join us as we shape the future of AI and beyond. Together, we advance your career. THE ROLE: As a Principal AI Infrastructure Solution Engineer, you will partner with AMD's AI software teams and customers to enable large-scale LLM training and inference on AMD Instinct GPUs. You will design and validate production-ready Kubernetes architectures and translate inference frameworks such as vLLM and SGLang into deployable customer solutions. Your work will accelerate customer time-to-production and strengthen AMD's leadership in AI infrastructure. THE PERSON: You are a solution-oriented AI infrastructure engineer with strong expertise in GPU-accelerated computing and large-scale AI deployments. You excel at translating complex technologies into customer-ready solutions and delivering production-grade Kubernetes-based inference and training systems. You bring hands-on experience with Kubernetes-native distributed training, including scheduling, topology-aware GPU placement, and operating resilient, high-performance AI workloads at scale. KEY RESPONSBILITIES:
- Design and deliver reference architectures for LLM training and inference on AMD GPUs, from single-node to multi-datacenter deployments using Kubernetes and SLURM.
- Architect and validate Kubernetes-based distributed training stacks for large-scale LLM workloads on AMD GPUs.
- Define and implement gang scheduling and topology-aware GPU placement for multi-node training workloads.
- Enable Kubernetes-native training controllers including Kubeflow Training Operator, MPI Operator, Volcano, and Kueue.
- Partner with enterprise customers and cloud providers to deploy and optimize production AMD GPU clusters for distributed inference and multi-tenant workloads.
- Implement and validate GPU orchestration using Kubernetes GPU Operator, device plugins, metrics exporters, and SLURM controllers.
- Benchmark and optimize LLM inference frameworks (vLLM, SGLang) on AMD hardware, producing customer-ready performance playbooks.
- Develop repeatable benchmarks for Kubernetes-based distributed training, covering scaling efficiency, step time, communication, and checkpointing.
- Create tuning guides for RCCL/NCCL-equivalent communication, CPU/GPU affinity, interconnect utilization, and workload-specific optimizations.
- Serve as the feedback loop between customers and AMD engineering, translating requirements into validated performance improvements.
- Deployed and operated large-scale GPU clusters for production AI training and inference
- Deep expertise in Kubernetes GPU orchestration (operators, device plugins, scheduling, multi-tenancy, observability)
- Hands-on experience with distributed training on Kubernetes (Kubeflow, MPI Operator, Volcano, Kueue, Ray)
- Strong knowledge of gang scheduling, elastic jobs, quotas, priority, and shared GPU environments
- Tuned Kubernetes networking and storage for AI workloads (high-performance CNI, RDMA where applicable, scalable checkpointing)
- Implemented ML observability for training (GPU/comms metrics, step-time analysis, SLO-driven ops)
- Experience in AI/ML infrastructure, solution architecture, and production GPU deployments
- Proven success enabling customers through complex AI platform deployments and migrations
- Strong background working across engineering and customer-facing roles
- Understanding of AI accelerator architectures and inference optimization techniques
- Experience operationalizing Kubernetes-based distributed training at scale
- Open-source contributions or AI infrastructure community engagement (plus)
- Santa Clara, Ca or open to discuss other locations.
Vacancy posted 4 days ago
Similar jobs that could be interesting for youBased on the Principal Software Engineer in Santa Clara, CA vacancy
$126k - $204.5k
...constantly innovating and challenging the way we, and the whole industry, think about cybersecurity. Your Career As a Principal Software Engineer, you will play a key role in the design and implementation of our Threat Intelligence Services for public and private...SuggestedFull timeTemporary workWork at office$143k - $286k
...generation content. What you'll do: Guide and mentor, a team of engineers, conducting code reviews and leading design discussions to... ...goals and scalability requirements.?? Architect complex software systems, ensuring performance, security, and scalability needs...SuggestedFull timeTemporary workPart time$272k - $431.25k
...most challenging issues in distributed AI infrastructure, and we’re searching for engineers enthusiastic about building the next generation of scalable AI systems. As a Principal Software Engineer on the Dynamo project, you will address some of the most sophisticated and...Suggested$147k - $237.5k
...the products and services that proactively address them. Our engineering team is at the core of our products – connected directly to the... ...remote networks and mobile users. We are seeking an experienced Software Engineer to design, develop and deliver next-generation...SuggestedFull timeWork at officeRemote work- ...deployment for remote networks and mobile users. As a Senior Engineer, your role will involve building and designing distributed... ...Analyze requirements, design, develop, and support highly scalable software features and infrastructure on our next-generation security platform...SuggestedFull timeCasual workWork at officeRemote workVisa sponsorshipWork visa3 days per week
$272k - $431.25k
...will lead the architecture and hands-on delivery across system software, drivers, and CUDA to make profiling continuously available... ...signals into actionable insights. Set technical direction for an engineering team; mentor engineers, drive technical planning to mitigate...$195k - $290k
...worldwide. We are significantly expanding our AI and machine learning capabilities on the endpoint. We are seeking a principal engineer to help define our endpoint AI learning framework. The person who fills this role will help us deliver new security...Work experience placementWork at officeLocal areaRemote workWorldwide2 days per week$165.8k - $307.9k
...Solutions, is responsible for ensuring a software product meets its specified... ...its development lifecycle. As a Principal Software Developer in Test, you will be... ...this role, you will represent quality engineering and verification on behalf of your team...Work at officeLocal areaRelocation package$143k - $286k
...responsible to support our Conversational AI mobile engineering team in driving down costs to implementation,... ...developing and enhancing our enterprise scale software applications and frameworks. We are looking for a Principal Android Engineer to build SDKs which will be...Full timeTemporary workPart time- ...Principal Engineer, Endpoint AI Learning Framework CrowdStrike's Sensor Security Platform team builds foundational security capabilities for Crowstrike's Falcon sensor, which runs on over 50 million endpoints worldwide. We are significantly expanding our AI and machine...Work at officeWorldwide2 days per week
$170k - $277k
...that drives great outcomes. Job Summary The Team Engineering - Our engineering team is at the core of our products and connected... ...of waiting for directions. Job Summary As a Sr. Principal Software Engineer in the Engineering team, you will join a world-...Full timeWork at office$147k - $237.5k
...that drives great outcomes. Job Summary Your Career The Cortex Xpanse group is growing, and we’re looking for a Principal Software Engineer to join our team. This team is at the forefront of identifying and mitigating external security risks by continuously discovering...Full timeWork at officeVisa sponsorshipWork visa- ...Principal Software Engineer Come join a growing bank at the heart of the innovation, technology, green tech and life sciences space. We continue to expand our global footprint and our banking technology is at the core of everything we do. Work within our DevOps team...Shift work
$170k - $210k
...security. At Fortinet, our mission is to safeguard people, devices, and data everywhere. We are currently seeking a Principal Software Developer Engineer for our FortiSwitch team. As a Principal Software Developer Engineer, you will: Develop and maintain software...Full timeWorldwideHome office$272k - $431.25k
...NVIDIA is the platform for every new AI-powered application. We seek a Principal Software Engineer - AI Inference to advance open-source LLM serving. This role involves contributing to upstream inference engines like vLLM and SGLang. You will ensure they run outstandingly...Remote work- ...Principal, Software Engineer Join Walmart as a Principal Software Engineer for the Colony Platform within our AI & Data organization and help make it trivially easy for associates — engineers, data scientists, and builders— to go from idea to AI-based solutions quickly...Contract workTemporary workLocal area
$272k - $431.25k
...technical leader to design, drive, and operationalize firmware and software architecture and design as well as collaborate with HW Design... ...the way through product production. Mentor architects and engineering teams to grow them into future leaders. Make key technical...Shift work$175k - $245k
...business requirements. Collaborate with our hardware team to support the delivery of our new platform. Maintain the existing software components, OS related. Requirements: B.S./M.S. with 8+ years of relevant experience. Hands-on experience with the Linux...Full timeWorldwide$147k - $237.5k
...outcomes. Job Summary The Team Engineering - Our engineering team is at the core... ...digital environment. Your Career As a Principal Engineer on the Prisma Access team, you... ..., develop, and support highly scalable software features and infrastructure on our next...Full timeWork at office$272k - $431.25k
...operates — from smart personal assistants and engineering-productivity tools to data-driven... ...used across the company. Now we need a principal-level, hands-on engineering leader to... ...Agentic AI applications behave like mature software, not prototypes. Build reusable...Live in$143k - $286k
...Position Summary... We are looking for a passionate and innovative software engineer to join Traffic Foundation within Walmart's Global Technology Platform group. Traffic Foundation is empowering application teams at Walmart to reach their customers in the fastest,...Full timeTemporary workPart timeWork at officeFlexible hours- ...Principle AWS Software Engineer The Bank is a growing bank in an Innovation economy. As a member of C&PB Development Team, you will be at the forefront of The Bank's Cloud Transformation journey and for building resilient business applications in cloud. As a Senior...
$147k - $237.5k
...kind of precision that drives great outcomes. Job Summary Your Career Bring your backend java cloud engineering skills to work on the latest cloud software/web applications. Help us deploy and scale the next generation of cloud security utilizing big data and...Full timeWork at officeVisa sponsorshipWork visa3 days per week$172k - $349k
...Principal Software Engineer This role has been designed as ''Onsite' with an expectation that you will primarily work from an HPE office. Who We Are: Hewlett Packard Enterprise is the global edge-to-cloud company advancing the way people live and work. We help...Work experience placementWork at office$170k - $277k
...stronger relationships, and the kind of precision that drives great outcomes. Job Summary We are seeking an experienced Software Engineer to contribute to the design, development, and delivery of next-generation technologies within our GlobalProtect team. We are...Full timeWork at office$170k - $277k
...that drives great outcomes. Job Summary The Team Engineering - Our engineering team is at the core of our products and connected... ...secure digital environment. Job Description As a Sr Principal Software Engineer within the Engineering team, you will drive the...Full timeWork at office$272k - $431.25k
...any single GPU, this platform enables efficient, resilient deployment of cutting-edge LLM workloads. We are seeking a Principal Systems Engineer to define the vision and roadmap for memory management of large-scale LLM and storage systems. What you'll be doing:...Local areaRemote work- ...Principal / Senior Software Engineer Location: Santa Clara, CA Duration: Full-time/Perm We are looking for a senior/principal Software Engineer with hands-on experience with x86 low level programming including device drivers, boot-up sequence, and BIOS. The ideal...Permanent employmentFull time
$170k - $210k
...Principal Software Engineer Step into a high-impact Principal Software Engineer opportunity with a confidential client, where you will help drive meaningful results across Banking / Lending/ Financial Services,Information Technology. This role offers the chance...$212k - $386.3k
...Principal Software Engineer, Retail Foundations Apple is where individual imaginations gather together, committing to the values that lead to great work. Every new product we build, service we create, or Apple Store experience we deliver is the result of us making each...Work experience placementRelocation
Do you want to receive more vacancies?
Subscribe and receive similar vacancies to Principal Software Engineer. Be the first to apply!
Related searches
- senior principal software engineer Santa Clara, CA
- principal software engineer Santa Clara, CA
- principal Santa Clara, CA
- senior principal cloud computing engineer Santa Clara, CA
- principal architect Santa Clara, CA
- principal data scientist Santa Clara, CA
- principal cloud computing engineer Santa Clara, CA
- senior principal scientist Santa Clara, CA
- id software Santa Clara, CA
- software quality assurance Santa Clara, CA

