Sign up to access all features of our service.
  • Job search
  • Favorites
  • Create a CV
    New
  • Salaries
  • Subscriptions

Principal Software Engineer

Advanced Micro Devices , Inc.

WHAT YOU DO AT AMD CHANGES EVERYTHING


At AMD, our mission is to build great products that accelerate next-generation computing experiences-from AI and data centers, to PCs, gaming and embedded systems. Grounded in a culture of innovation and collaboration, we believe real progress comes from bold ideas, human ingenuity and a shared passion to create something extraordinary. When you join AMD, you'll discover the real differentiator is our culture. We push the limits of innovation to solve the world's most important challenges-striving for execution excellence, while being direct, humble, collaborative, and inclusive of diverse perspectives. Join us as we shape the future of AI and beyond. Together, we advance your career.

THE ROLE:

As a Principal AI Infrastructure Solution Engineer, you will partner with AMD's AI software teams and customers to enable large-scale LLM training and inference on AMD Instinct GPUs. You will design and validate production-ready Kubernetes architectures and translate inference frameworks such as vLLM and SGLang into deployable customer solutions. Your work will accelerate customer time-to-production and strengthen AMD's leadership in AI infrastructure.

THE PERSON:

You are a solution-oriented AI infrastructure engineer with strong expertise in GPU-accelerated computing and large-scale AI deployments. You excel at translating complex technologies into customer-ready solutions and delivering production-grade Kubernetes-based inference and training systems. You bring hands-on experience with Kubernetes-native distributed training, including scheduling, topology-aware GPU placement, and operating resilient, high-performance AI workloads at scale.

KEY RESPONSBILITIES:
  • Design and deliver reference architectures for LLM training and inference on AMD GPUs, from single-node to multi-datacenter deployments using Kubernetes and SLURM.
  • Architect and validate Kubernetes-based distributed training stacks for large-scale LLM workloads on AMD GPUs.
  • Define and implement gang scheduling and topology-aware GPU placement for multi-node training workloads.
  • Enable Kubernetes-native training controllers including Kubeflow Training Operator, MPI Operator, Volcano, and Kueue.
  • Partner with enterprise customers and cloud providers to deploy and optimize production AMD GPU clusters for distributed inference and multi-tenant workloads.
  • Implement and validate GPU orchestration using Kubernetes GPU Operator, device plugins, metrics exporters, and SLURM controllers.
  • Benchmark and optimize LLM inference frameworks (vLLM, SGLang) on AMD hardware, producing customer-ready performance playbooks.
  • Develop repeatable benchmarks for Kubernetes-based distributed training, covering scaling efficiency, step time, communication, and checkpointing.
  • Create tuning guides for RCCL/NCCL-equivalent communication, CPU/GPU affinity, interconnect utilization, and workload-specific optimizations.
  • Serve as the feedback loop between customers and AMD engineering, translating requirements into validated performance improvements.
PREFERRED EXPERIENCE:
  • Deployed and operated large-scale GPU clusters for production AI training and inference
  • Deep expertise in Kubernetes GPU orchestration (operators, device plugins, scheduling, multi-tenancy, observability)
  • Hands-on experience with distributed training on Kubernetes (Kubeflow, MPI Operator, Volcano, Kueue, Ray)
  • Strong knowledge of gang scheduling, elastic jobs, quotas, priority, and shared GPU environments
  • Tuned Kubernetes networking and storage for AI workloads (high-performance CNI, RDMA where applicable, scalable checkpointing)
  • Implemented ML observability for training (GPU/comms metrics, step-time analysis, SLO-driven ops)
  • Experience in AI/ML infrastructure, solution architecture, and production GPU deployments
  • Proven success enabling customers through complex AI platform deployments and migrations
  • Strong background working across engineering and customer-facing roles
  • Understanding of AI accelerator architectures and inference optimization techniques
  • Experience operationalizing Kubernetes-based distributed training at scale
  • Open-source contributions or AI infrastructure community engagement (plus)
LOCATION:
  • Santa Clara, Ca or open to discuss other locations.

This role is not eligible for visa sponsorship.

#LI-EV1

#LI-HYBRID

Benefits offered are described: AMD benefits at a glance.

AMD does not accept unsolicited resumes from headhunters, recruitment agencies, or fee-based recruitment services. AMD and its subsidiaries are equal opportunity, inclusive employers and will consider all applicants without regard to age, ancestry, color, marital status, medical condition, mental or physical disability, national origin, race, religion, political and/or third-party affiliation, sex, pregnancy, sexual orientation, gender identity, military or veteran status, or any other characteristic protected by law. We encourage applications from all qualified candidates and will accommodate applicants' needs under the respective laws throughout all stages of the recruitment and selection process.

AMD may use Artificial Intelligence to help screen, assess or select applicants for this position. AMD's "Responsible AI Policy" is available here.

This posting is for an existing vacancy.
Vacancy posted 4 days ago
Similar jobs that could be interesting for youBased on the Principal Software Engineer in Santa Clara, CA vacancy
  • $126k - $204.5k

     ...constantly innovating and challenging the way we, and the whole industry, think about cybersecurity. Your Career As a Principal Software Engineer, you will play a key role in the design and implementation of our Threat Intelligence Services for public and private... 
    Suggested
    Full time
    Temporary work
    Work at office

    Palo Alto Networks

    Santa Clara, CA
    5 days ago
  • $143k - $286k

     ...generation content. What you'll do: Guide and mentor, a team of engineers, conducting code reviews and leading design discussions to...  ...goals and scalability requirements.?? Architect complex software systems, ensuring performance, security, and scalability needs... 
    Suggested
    Full time
    Temporary work
    Part time

    Walmart

    Sunnyvale, CA
    1 day ago
  • $272k - $431.25k

     ...most challenging issues in distributed AI infrastructure, and we’re searching for engineers enthusiastic about building the next generation of scalable AI systems. As a Principal Software Engineer on the Dynamo project, you will address some of the most sophisticated and... 
    Suggested

    NVIDIA

    Santa Clara, CA
    1 day ago
  • $147k - $237.5k

     ...the products and services that proactively address them. Our engineering team is at the core of our products – connected directly to the...  ...remote networks and mobile users. We are seeking an experienced Software Engineer to design, develop and deliver next-generation... 
    Suggested
    Full time
    Work at office
    Remote work

    Palo Alto Networks

    Santa Clara, CA
    1 day ago
  •  ...deployment for remote networks and mobile users. As a Senior Engineer, your role will involve building and designing distributed...  ...Analyze requirements, design, develop, and support highly scalable software features and infrastructure on our next-generation security platform... 
    Suggested
    Full time
    Casual work
    Work at office
    Remote work
    Visa sponsorship
    Work visa
    3 days per week

    Palo Alto Networks

    Santa Clara, CA
    7 hours ago
  • $272k - $431.25k

     ...will lead the architecture and hands-on delivery across system software, drivers, and CUDA to make profiling continuously available...  ...signals into actionable insights. Set technical direction for an engineering team; mentor engineers, drive technical planning to mitigate... 

    NVIDIA

    Santa Clara, CA
    7 hours ago
  • $195k - $290k

     ...worldwide. We are significantly expanding our AI and machine learning capabilities on the endpoint. We are seeking a principal engineer to help define our endpoint AI learning framework. The person who fills this role will help us deliver new security... 
    Work experience placement
    Work at office
    Local area
    Remote work
    Worldwide
    2 days per week

    CrowdStrike Holdings, Inc.

    Sunnyvale, CA
    4 days ago
  • $165.8k - $307.9k

     ...Solutions, is responsible for ensuring a software product meets its specified...  ...its development lifecycle. As a Principal Software Developer in Test, you will be...  ...this role, you will represent quality engineering and verification on behalf of your team... 
    Work at office
    Local area
    Relocation package

    F. Hoffmann-La Roche Ltd

    Santa Clara, CA
    7 hours ago
  • $143k - $286k

     ...responsible to support our Conversational AI mobile engineering team in driving down costs to implementation,...  ...developing and enhancing our enterprise scale software applications and frameworks. We are looking for a Principal Android Engineer to build SDKs which will be... 
    Full time
    Temporary work
    Part time

    Walmart

    Sunnyvale, CA
    3 days ago
  •  ...Principal Engineer, Endpoint AI Learning Framework CrowdStrike's Sensor Security Platform team builds foundational security capabilities for Crowstrike's Falcon sensor, which runs on over 50 million endpoints worldwide. We are significantly expanding our AI and machine... 
    Work at office
    Worldwide
    2 days per week

    CrowdStrike

    Sunnyvale, CA
    1 day ago
  • $170k - $277k

     ...that drives great outcomes. Job Summary The Team Engineering - Our engineering team is at the core of our products and connected...  ...of waiting for directions. Job Summary As a Sr. Principal Software Engineer in the Engineering team, you will join a world-... 
    Full time
    Work at office

    Palo Alto Networks

    Santa Clara, CA
    3 days ago
  • $147k - $237.5k

     ...that drives great outcomes. Job Summary Your Career The Cortex Xpanse group is growing, and we’re looking for a Principal Software Engineer to join our team. This team is at the forefront of identifying and mitigating external security risks by continuously discovering... 
    Full time
    Work at office
    Visa sponsorship
    Work visa

    Palo Alto Networks

    Santa Clara, CA
    1 day ago
  •  ...Principal Software Engineer Come join a growing bank at the heart of the innovation, technology, green tech and life sciences space. We continue to expand our global footprint and our banking technology is at the core of everything we do. Work within our DevOps team... 
    Shift work

    Professional Recruiters

    Santa Clara, CA
    11 days ago
  • $170k - $210k

     ...security. At Fortinet, our mission is to safeguard people, devices, and data everywhere. We are currently seeking a Principal Software Developer Engineer for our FortiSwitch team. As a Principal Software Developer Engineer, you will: Develop and maintain software... 
    Full time
    Worldwide
    Home office

    Fortinet

    Sunnyvale, CA
    2 days ago
  • $272k - $431.25k

     ...NVIDIA is the platform for every new AI-powered application. We seek a Principal Software Engineer - AI Inference to advance open-source LLM serving. This role involves contributing to upstream inference engines like vLLM and SGLang. You will ensure they run outstandingly... 
    Remote work

    NVIDIA

    Santa Clara, CA
    5 days ago
  •  ...Principal, Software Engineer Join Walmart as a Principal Software Engineer for the Colony Platform within our AI & Data organization and help make it trivially easy for associates — engineers, data scientists, and builders— to go from idea to AI-based solutions quickly... 
    Contract work
    Temporary work
    Local area

    Walmart

    Sunnyvale, CA
    3 days ago
  • $272k - $431.25k

     ...technical leader to design, drive, and operationalize firmware and software architecture and design as well as collaborate with HW Design...  ...the way through product production. Mentor architects and engineering teams to grow them into future leaders. Make key technical... 
    Shift work

    NVIDIA

    Santa Clara, CA
    4 days ago
  • $175k - $245k

     ...business requirements. Collaborate with our hardware team to support the delivery of our new platform. Maintain the existing software components, OS related. Requirements: B.S./M.S. with 8+ years of relevant experience. Hands-on experience with the Linux... 
    Full time
    Worldwide

    Fortinet

    Sunnyvale, CA
    4 days ago
  • $147k - $237.5k

     ...outcomes. Job Summary The Team Engineering - Our engineering team is at the core...  ...digital environment. Your Career As a Principal Engineer on the Prisma Access team, you...  ..., develop, and support highly scalable software features and infrastructure on our next... 
    Full time
    Work at office

    Palo Alto Networks

    Santa Clara, CA
    1 day ago
  • $272k - $431.25k

     ...operates — from smart personal assistants and engineering-productivity tools to data-driven...  ...used across the company. Now we need a principal-level, hands-on engineering leader to...  ...Agentic AI applications behave like mature software, not prototypes. Build reusable... 
    Live in

    NVIDIA

    Santa Clara, CA
    25 days ago
  • $143k - $286k

     ...Position Summary... We are looking for a passionate and innovative software engineer to join Traffic Foundation within Walmart's Global Technology Platform group. Traffic Foundation is empowering application teams at Walmart to reach their customers in the fastest,... 
    Full time
    Temporary work
    Part time
    Work at office
    Flexible hours

    Walmart

    Sunnyvale, CA
    3 days ago
  •  ...Principle AWS Software Engineer The Bank is a growing bank in an Innovation economy. As a member of C&PB Development Team, you will be at the forefront of The Bank's Cloud Transformation journey and for building resilient business applications in cloud. As a Senior... 

    Professional Recruiters

    Santa Clara, CA
    1 day ago
  • $147k - $237.5k

     ...kind of precision that drives great outcomes. Job Summary Your Career Bring your backend java cloud engineering skills to work on the latest cloud software/web applications. Help us deploy and scale the next generation of cloud security utilizing big data and... 
    Full time
    Work at office
    Visa sponsorship
    Work visa
    3 days per week

    Palo Alto Networks

    Santa Clara, CA
    1 day ago
  • $172k - $349k

     ...Principal Software Engineer This role has been designed as ''Onsite' with an expectation that you will primarily work from an HPE office. Who We Are: Hewlett Packard Enterprise is the global edge-to-cloud company advancing the way people live and work. We help... 
    Work experience placement
    Work at office

    Hewlett Packard Enterprise Development LP

    Sunnyvale, CA
    7 hours ago
  • $170k - $277k

     ...stronger relationships, and the kind of precision that drives great outcomes. Job Summary We are seeking an experienced Software Engineer to contribute to the design, development, and delivery of next-generation technologies within our GlobalProtect team. We are... 
    Full time
    Work at office

    Palo Alto Networks

    Santa Clara, CA
    4 days ago
  • $170k - $277k

     ...that drives great outcomes. Job Summary The Team Engineering - Our engineering team is at the core of our products and connected...  ...secure digital environment. Job Description As a Sr Principal Software Engineer within the Engineering team, you will drive the... 
    Full time
    Work at office

    Palo Alto Networks

    Santa Clara, CA
    5 days ago
  • $272k - $431.25k

     ...any single GPU, this platform enables efficient, resilient deployment of cutting-edge LLM workloads. We are seeking a Principal Systems Engineer to define the vision and roadmap for memory management of large-scale LLM and storage systems. What you'll be doing:... 
    Local area
    Remote work

    NVIDIA

    Santa Clara, CA
    1 day ago
  •  ...Principal / Senior Software Engineer Location: Santa Clara, CA Duration: Full-time/Perm We are looking for a senior/principal Software Engineer with hands-on experience with x86 low level programming including device drivers, boot-up sequence, and BIOS. The ideal... 
    Permanent employment
    Full time

    InterSources

    Santa Clara, CA
    1 day ago
  • $170k - $210k

     ...Principal Software Engineer Step into a high-impact Principal Software Engineer opportunity with a confidential client, where you will help drive meaningful results across Banking / Lending/ Financial Services,Information Technology. This role offers the chance... 

    Top Engineer

    Santa Clara, CA
    1 day ago
  • $212k - $386.3k

     ...Principal Software Engineer, Retail Foundations Apple is where individual imaginations gather together, committing to the values that lead to great work. Every new product we build, service we create, or Apple Store experience we deliver is the result of us making each... 
    Work experience placement
    Relocation

    Apple

    Sunnyvale, CA
    1 day ago

Do you want to receive more vacancies?

Subscribe and receive similar vacancies to Principal Software Engineer. Be the first to apply!