Sign up to access all features of our service.
  • Job search
  • Favorites
  • Create a CV
    New
  • Salaries
  • Subscriptions

Principal Software Engineer

Advanced Micro Devices , Inc.

WHAT YOU DO AT AMD CHANGES EVERYTHING


At AMD, our mission is to build great products that accelerate next-generation computing experiences-from AI and data centers, to PCs, gaming and embedded systems. Grounded in a culture of innovation and collaboration, we believe real progress comes from bold ideas, human ingenuity and a shared passion to create something extraordinary. When you join AMD, you'll discover the real differentiator is our culture. We push the limits of innovation to solve the world's most important challenges-striving for execution excellence, while being direct, humble, collaborative, and inclusive of diverse perspectives. Join us as we shape the future of AI and beyond. Together, we advance your career.

THE ROLE:

As a Principal AI Infrastructure Solution Engineer, you will partner with AMD's AI software teams and customers to enable large-scale LLM training and inference on AMD Instinct GPUs. You will design and validate production-ready Kubernetes architectures and translate inference frameworks such as vLLM and SGLang into deployable customer solutions. Your work will accelerate customer time-to-production and strengthen AMD's leadership in AI infrastructure.

THE PERSON:

You are a solution-oriented AI infrastructure engineer with strong expertise in GPU-accelerated computing and large-scale AI deployments. You excel at translating complex technologies into customer-ready solutions and delivering production-grade Kubernetes-based inference and training systems. You bring hands-on experience with Kubernetes-native distributed training, including scheduling, topology-aware GPU placement, and operating resilient, high-performance AI workloads at scale.

KEY RESPONSBILITIES:
  • Design and deliver reference architectures for LLM training and inference on AMD GPUs, from single-node to multi-datacenter deployments using Kubernetes and SLURM.
  • Architect and validate Kubernetes-based distributed training stacks for large-scale LLM workloads on AMD GPUs.
  • Define and implement gang scheduling and topology-aware GPU placement for multi-node training workloads.
  • Enable Kubernetes-native training controllers including Kubeflow Training Operator, MPI Operator, Volcano, and Kueue.
  • Partner with enterprise customers and cloud providers to deploy and optimize production AMD GPU clusters for distributed inference and multi-tenant workloads.
  • Implement and validate GPU orchestration using Kubernetes GPU Operator, device plugins, metrics exporters, and SLURM controllers.
  • Benchmark and optimize LLM inference frameworks (vLLM, SGLang) on AMD hardware, producing customer-ready performance playbooks.
  • Develop repeatable benchmarks for Kubernetes-based distributed training, covering scaling efficiency, step time, communication, and checkpointing.
  • Create tuning guides for RCCL/NCCL-equivalent communication, CPU/GPU affinity, interconnect utilization, and workload-specific optimizations.
  • Serve as the feedback loop between customers and AMD engineering, translating requirements into validated performance improvements.
PREFERRED EXPERIENCE:
  • Deployed and operated large-scale GPU clusters for production AI training and inference
  • Deep expertise in Kubernetes GPU orchestration (operators, device plugins, scheduling, multi-tenancy, observability)
  • Hands-on experience with distributed training on Kubernetes (Kubeflow, MPI Operator, Volcano, Kueue, Ray)
  • Strong knowledge of gang scheduling, elastic jobs, quotas, priority, and shared GPU environments
  • Tuned Kubernetes networking and storage for AI workloads (high-performance CNI, RDMA where applicable, scalable checkpointing)
  • Implemented ML observability for training (GPU/comms metrics, step-time analysis, SLO-driven ops)
  • Experience in AI/ML infrastructure, solution architecture, and production GPU deployments
  • Proven success enabling customers through complex AI platform deployments and migrations
  • Strong background working across engineering and customer-facing roles
  • Understanding of AI accelerator architectures and inference optimization techniques
  • Experience operationalizing Kubernetes-based distributed training at scale
  • Open-source contributions or AI infrastructure community engagement (plus)
LOCATION:
  • Santa Clara, Ca or open to discuss other locations.

This role is not eligible for visa sponsorship.

#LI-EV1

#LI-HYBRID

Benefits offered are described: AMD benefits at a glance.

AMD does not accept unsolicited resumes from headhunters, recruitment agencies, or fee-based recruitment services. AMD and its subsidiaries are equal opportunity, inclusive employers and will consider all applicants without regard to age, ancestry, color, marital status, medical condition, mental or physical disability, national origin, race, religion, political and/or third-party affiliation, sex, pregnancy, sexual orientation, gender identity, military or veteran status, or any other characteristic protected by law. We encourage applications from all qualified candidates and will accommodate applicants' needs under the respective laws throughout all stages of the recruitment and selection process.

AMD may use Artificial Intelligence to help screen, assess or select applicants for this position. AMD's "Responsible AI Policy" is available here.

This posting is for an existing vacancy.
Vacancy posted 4 days ago
Similar jobs that could be interesting for youBased on the Principal Software Engineer in Santa Clara, CA vacancy
  • $170k - $277k

     ...that drives great outcomes. Job Summary The Team Engineering - Our engineering team is at the core of our products and connected...  ...of waiting for directions. Job Summary As a Sr. Principal Software Engineer in the Engineering team, you will join a world-... 
    Suggested
    Full time
    Work at office

    Palo Alto Networks

    Santa Clara, CA
    3 days ago
  • $147k - $237.5k

     ...that drives great outcomes. Job Summary Your Career The Cortex Xpanse group is growing, and we’re looking for a Principal Software Engineer to join our team. This team is at the forefront of identifying and mitigating external security risks by continuously discovering... 
    Suggested
    Full time
    Work at office
    Visa sponsorship
    Work visa

    Palo Alto Networks

    Santa Clara, CA
    1 day ago
  • $195k - $290k

     ...worldwide. We are significantly expanding our AI and machine learning capabilities on the endpoint. We are seeking a principal engineer to help define our endpoint AI learning framework. The person who fills this role will help us deliver new security... 
    Suggested
    Work experience placement
    Work at office
    Local area
    Remote work
    Worldwide
    2 days per week

    CrowdStrike Holdings, Inc.

    Sunnyvale, CA
    4 days ago
  •  ...Principal Engineer, Endpoint AI Learning Framework CrowdStrike's Sensor Security Platform team builds foundational security capabilities for Crowstrike's Falcon sensor, which runs on over 50 million endpoints worldwide. We are significantly expanding our AI and machine... 
    Suggested
    Work at office
    Worldwide
    2 days per week

    CrowdStrike

    Sunnyvale, CA
    1 day ago
  • $165.8k - $307.9k

     ...Solutions, is responsible for ensuring a software product meets its specified...  ...its development lifecycle. As a Principal Software Developer in Test, you will be...  ...this role, you will represent quality engineering and verification on behalf of your team... 
    Suggested
    Work at office
    Local area
    Relocation package

    F. Hoffmann-La Roche Ltd

    Santa Clara, CA
    12 hours ago
  • $143k - $286k

     ...responsible to support our Conversational AI mobile engineering team in driving down costs to implementation,...  ...developing and enhancing our enterprise scale software applications and frameworks. We are looking for a Principal Android Engineer to build SDKs which will be... 
    Full time
    Temporary work
    Part time

    Walmart

    Sunnyvale, CA
    3 days ago
  • $272k - $431.25k

     ...NVIDIA is seeking a strategic and technically proficient Principal Software Engineer to join the Data Center Systems and Software CSP engagements team. As a leader and technologist, you will play a pivotal role contributing significantly to the architecture and development... 
    Shift work

    NVIDIA

    Santa Clara, CA
    2 days ago
  •  ...Principal Software Engineer The Bank is the most sought-after financial partner in the global innovation economy. We bring together entrepreneurs, investors, venture capitalists, and private equity firms to move their bold ideas forward, fast. Our clients define what... 
    Temporary work

    Professional Recruiters

    Santa Clara, CA
    1 day ago
  • $167k - $270.5k

     ...solving, stronger relationships, and the kind of precision that drives great outcomes. Job Summary Job Summary As a Principal Software Engineer to join our CPQ (Configure Price and Quote) team, you will serve as the recognized subject matter expert, bringing... 
    Full time
    Work at office
    Visa sponsorship
    Work visa
    3 days per week

    Palo Alto Networks

    Santa Clara, CA
    12 hours ago
  • $143k - $286k

     ...the forefront of AI innovation, leveraging cutting-edge technologies to redefine customer experiences. We are seeking a Principal, Software Engineer with deep expertise in Generative AI , LLMs , and RAG frameworks to lead the design, development, and deployment... 
    Full time
    Temporary work
    Part time

    Walmart

    Sunnyvale, CA
    12 hours ago
  •  ...Strategic Software Engineering Lead At AMD, our mission is to build great products that accelerate next-generation computing experiences—from AI and data centers, to PCs, gaming and embedded systems. Grounded in a culture of innovation and collaboration, we believe... 

    Advanced Micro Devices , Inc.

    Santa Clara, CA
    12 hours ago
  • $248k - $391k

     ...groundbreaking technology that’s setting the standard for graphical processing, PC gaming, and AI computing. As a Principal Software Development Engineer specializing in Solid State Drives (SSD), you’ll help define accelerated storage technology! What You Will Be Doing... 

    NVIDIA

    Santa Clara, CA
    19 hours ago
  • $143k - $286k

     ...Position Summary... What you'll do... Principal, Software Engineer We are seeking a talented and passionate Principal, Software Engineer to join our International Technology Organization. The ideal candidate will have experience as a tech lead and strong development... 
    Full time
    Temporary work
    Part time
    Work at office
    Flexible hours

    Walmart

    Sunnyvale, CA
    4 days ago
  • $212k - $386.3k

     ...Principal Software Engineer, Retail Foundations Apple is where individual imaginations gather together, committing to the values that lead to great work. Every new product we build, service we create, or Apple Store experience we deliver is the result of us making each... 
    Work experience placement
    Relocation

    Apple

    Sunnyvale, CA
    1 day ago
  • $170k - $277k

     ...stronger relationships, and the kind of precision that drives great outcomes. Job Summary We are seeking an experienced Software Engineer to contribute to the design, development, and delivery of next-generation technologies within our GlobalProtect team. We are... 
    Full time
    Work at office

    Palo Alto Networks

    Santa Clara, CA
    4 days ago
  • $272k - $431.25k

     ...any single GPU, this platform enables efficient, resilient deployment of cutting-edge LLM workloads. We are seeking a Principal Systems Engineer to define the vision and roadmap for memory management of large-scale LLM and storage systems. What you'll be doing:... 
    Local area
    Remote work

    NVIDIA

    Santa Clara, CA
    1 day ago
  • $170k - $210k

     ...Principal Software Engineer Step into a high-impact Principal Software Engineer opportunity with a confidential client, where you will help drive meaningful results across Banking / Lending/ Financial Services,Information Technology. This role offers the chance... 

    Top Engineer

    Santa Clara, CA
    1 day ago
  •  ...Principal / Senior Software Engineer Location: Santa Clara, CA Duration: Full-time/Perm We are looking for a senior/principal Software Engineer with hands-on experience with x86 low level programming including device drivers, boot-up sequence, and BIOS. The ideal... 
    Permanent employment
    Full time

    InterSources

    Santa Clara, CA
    1 day ago
  • $170k - $277k

     ...that drives great outcomes. Job Summary The Team Engineering - Our engineering team is at the core of our products and connected...  ...secure digital environment. Job Description As a Sr Principal Software Engineer within the Engineering team, you will drive the... 
    Full time
    Work at office

    Palo Alto Networks

    Santa Clara, CA
    5 days ago
  •  ...Principal, Software Engineer Join Walmart as a Principal Software Engineer for the Colony Platform within our AI & Data organization and help make it trivially easy for associates — engineers, data scientists, and builders— to go from idea to AI-based solutions quickly... 
    Contract work
    Temporary work
    Local area

    Walmart

    Sunnyvale, CA
    3 days ago
  •  ...Principal Software Engineer Come join a growing bank at the heart of the innovation, technology, green tech and life sciences space. We continue to expand our global footprint and our banking technology is at the core of everything we do. Work within our DevOps team... 
    Shift work

    Professional Recruiters

    Santa Clara, CA
    11 days ago
  • $170k - $210k

     ...security. At Fortinet, our mission is to safeguard people, devices, and data everywhere. We are currently seeking a Principal Software Developer Engineer for our FortiSwitch team. As a Principal Software Developer Engineer, you will: Develop and maintain software... 
    Full time
    Worldwide
    Home office

    Fortinet

    Sunnyvale, CA
    2 days ago
  • $272k - $431.25k

     ...NVIDIA is the platform for every new AI-powered application. We seek a Principal Software Engineer - AI Inference to advance open-source LLM serving. This role involves contributing to upstream inference engines like vLLM and SGLang. You will ensure they run outstandingly... 
    Remote work

    NVIDIA

    Santa Clara, CA
    12 hours ago
  • $175k - $245k

     ...business requirements. Collaborate with our hardware team to support the delivery of our new platform. Maintain the existing software components, OS related. Requirements: B.S./M.S. with 8+ years of relevant experience. Hands-on experience with the Linux... 
    Full time
    Worldwide

    Fortinet

    Sunnyvale, CA
    4 days ago
  • $272k - $431.25k

     ...technical leader to design, drive, and operationalize firmware and software architecture and design as well as collaborate with HW Design...  ...the way through product production. Mentor architects and engineering teams to grow them into future leaders. Make key technical... 
    Shift work

    NVIDIA

    Santa Clara, CA
    4 days ago
  • $126k - $204.5k

     ...constantly innovating and challenging the way we, and the whole industry, think about cybersecurity. Your Career As a Principal Software Engineer, you will play a key role in the design and implementation of our Threat Intelligence Services for public and private... 
    Full time
    Temporary work
    Work at office

    Palo Alto Networks

    Santa Clara, CA
    5 days ago
  •  ...deployment for remote networks and mobile users. As a Senior Engineer, your role will involve building and designing distributed...  ...Analyze requirements, design, develop, and support highly scalable software features and infrastructure on our next-generation security platform... 
    Full time
    Casual work
    Work at office
    Remote work
    Visa sponsorship
    Work visa
    3 days per week

    Palo Alto Networks

    Santa Clara, CA
    12 hours ago
  • $272k - $431.25k

     ...will lead the architecture and hands-on delivery across system software, drivers, and CUDA to make profiling continuously available...  ...signals into actionable insights. Set technical direction for an engineering team; mentor engineers, drive technical planning to mitigate... 

    NVIDIA

    Santa Clara, CA
    12 hours ago
  • $147k - $237.5k

     ...the products and services that proactively address them. Our engineering team is at the core of our products – connected directly to the...  ...remote networks and mobile users. We are seeking an experienced Software Engineer to design, develop and deliver next-generation... 
    Full time
    Work at office
    Remote work

    Palo Alto Networks

    Santa Clara, CA
    1 day ago
  • $143k - $286k

     ...generation content. What you'll do: Guide and mentor, a team of engineers, conducting code reviews and leading design discussions to...  ...goals and scalability requirements.?? Architect complex software systems, ensuring performance, security, and scalability needs... 
    Full time
    Temporary work
    Part time

    Walmart

    Sunnyvale, CA
    1 day ago

Do you want to receive more vacancies?

Subscribe and receive similar vacancies to Principal Software Engineer. Be the first to apply!