Sign up to access all features of our service.
  • Job search
  • Favorites
  • Create a CV
    New
  • Salaries
  • Subscriptions

AI Infrastructure Engineer

Advanced Micro Devices , Inc.

WHAT YOU DO AT AMD CHANGES EVERYTHING


At AMD, our mission is to build great products that accelerate next-generation computing experiences-from AI and data centers, to PCs, gaming and embedded systems. Grounded in a culture of innovation and collaboration, we believe real progress comes from bold ideas, human ingenuity and a shared passion to create something extraordinary. When you join AMD, you'll discover the real differentiator is our culture. We push the limits of innovation to solve the world's most important challenges-striving for execution excellence, while being direct, humble, collaborative, and inclusive of diverse perspectives. Join us as we shape the future of AI and beyond. Together, we advance your career.

THE PERSON:


We are seeking a DevOps / Platform Engineer to join our team building and operating large-scale GPU compute infrastructure that powers AI and ML workloads. The ideal candidate should be passionate about software engineering and possess leadership skills to independently deliver on multi-quarter projects. They should be able to caommunicate effectively and work optimally with their peers within our larger organization. Finally, you aren't afraid of a team in more of a startup mode at a larger company and willing to jump in to help in areas adjacent to your main project as needed.

Key Responsibilities
  • Build and extend platform capabilities to enable new classes of workloads (e.g., interactive development pods, CI pipelines, inference services, benchmarking jobs).
  • Design and operate scalable orchestration systems using Kubernetes across both on-prem and multi-cloud environments.
  • Develop platform features such as secret management, configuration management, and deployment automation for customers.
  • Partner with development teams to extend the GPU developer platform with features, APIs, templates, and self-service workflows that streamline job orchestration and environment management.
  • Manage service lifecycle within Kubernetes using Helm and GitOps workflows (e.g., ArgoCD or Flux).
  • Apply expertise in storage and networking to design and integrate CSI drivers, persistent volumes, and network policies that enable high-performance GPU workloads.
Required Qualifications
  • 5+ years of experience in DevOps, Platform, or Infrastructure Engineering.
  • Deep hands-on experience with Kubernetes and container orchestration at scale.
  • Proven ability to design and deliver platform features that serve internal customers or developer teams
  • Experience building developer-facing platforms or internal developer portals (e.g.custom workflow tooling).
Nice to Have
  • Hands-on experience in storage or network engineering within Kubernetes environments (e.g., CSI drivers, dynamic provisioning, CNI plugins, or network policy).
  • Experience with Infrastructure as Code tools like Terraform.
  • Background in HPC, Slurm , or GPU-based compute systems for ML/AI workloads.
  • Practical experience with monitoring and observability tools (Prometheus, Grafana, Loki, etc.).
  • Understanding of machine learning frameworks (PyTorch, vLLM, SGLang, etc.).

#LI-G11


#LI-HYBRID

Benefits offered are described: AMD benefits at a glance.

AMD does not accept unsolicited resumes from headhunters, recruitment agencies, or fee-based recruitment services. AMD and its subsidiaries are equal opportunity, inclusive employers and will consider all applicants without regard to age, ancestry, color, marital status, medical condition, mental or physical disability, national origin, race, religion, political and/or third-party affiliation, sex, pregnancy, sexual orientation, gender identity, military or veteran status, or any other characteristic protected by law. We encourage applications from all qualified candidates and will accommodate applicants' needs under the respective laws throughout all stages of the recruitment and selection process.

AMD may use Artificial Intelligence to help screen, assess or select applicants for this position. AMD's "Responsible AI Policy" is available here.

This posting is for an existing vacancy.
Vacancy posted 3 days ago
Similar jobs that could be interesting for youBased on the AI Infrastructure Engineer in San Jose, CA vacancy
  • $94.5k - $212.5k

     ...That's why we continuously invest in innovative ideas, such as AI-enabled insights and technology-powered solutions, to...  ...future of our industry. About the Role As an AI Infrastructure Engineer, you are a deep technical contributor with subsystem ownership... 
    Suggested
    Local area
    Worldwide

    Crowe

    San Jose, CA
    2 days ago
  • $192.1k - $249.6k

     ...Senior AI Inference Infrastructure Software Engineer NIO is a pioneer and a leading company in the premium smart electric vehicle market. Founded in November 2014, NIO's mission is to shape a joyful lifestyle. NIO aims to build a community starting with smart electric... 
    Suggested
    Full time
    Temporary work
    Immediate start
    Flexible hours

    NIO

    San Jose, CA
    1 day ago
  • $172.5k - $306.63k

     ...The Opportunity Adobe empowers individuals and organizations to create exceptional content effortlessly. The AI for Engineering team builds a scalable, production‑grade AI platform that powers creativity across design, imaging, motion, and personalization. We are seeking... 
    Suggested
    Local area

    Dormont Manufacturing Company

    San Jose, CA
    2 days ago
  •  ...Corporation in Santa Clara is seeking a Senior Software Engineer to lead the optimization of large-scale AI systems. This role will involve profiling and...  ...will have over 8 years of experience in software infrastructure for AI systems, with expert-level programming in Python... 
    Suggested

    NVIDIA Corporation

    Santa Clara, CA
    4 days ago
  • $126k - $423k

    Decisive Point is seeking a Research Engineer (AI/RL Infrastructure) in Sunnyvale, California to design and operate large-scale ML systems. You will collaborate with leading experts and contribute to next-generation physical AI, impacting self-driving technologies. This... 
    Suggested

    Decisive Point

    Sunnyvale, CA
    4 days ago
  • $168k - $270.25k

     ...looking to hire a deeply technical, creative, and Senior AI Platform Engineer to build, support, and maintain the next generation of AI-...  ...What you will be doing: Define and lead AI-native infrastructure roadmaps and cross-organizational initiatives. Architect... 

    NVIDIA

    Santa Clara, CA
    1 day ago
  •  ...AI Platform Engineer - Training & Inference Saviynt's AI-powered identity platform manages and governs human and non-human access to...  ...between self-hosted SLMs and cloud LLMs • Build RL training infrastructure: define Flyte workflows for RL pipelines (rollout, reward... 

    Saviynt

    Milpitas, CA
    5 days ago
  • $314.8k - $359.3k

     ...Sr. Distinguished AI Engineer (Agentic AI Platform) Overview: At Capital One, we are creating responsible and reliable AI systems...  ...customer experiences. Our investments in technology infrastructure and world-class talent - along with our deep experience in machine... 
    Full time
    Part time
    Work at office
    Local area

    Capital One

    San Jose, CA
    2 days ago
  •  ...ideas even when they are unproven.* You are deeply Technical. You possess a strong foundation in engineering and mathematics, and your expertise in hardware, software, and AI enable you to see and exploit optimization opportunities that others miss.* You are a resilient... 
    Full time
    Part time

    Capital One

    San Jose, CA
    3 days ago
  • $269.1k - $307.2k

     ...Distinguished AI Engineer (Agentic AI Platform) At Capital One, we are creating responsible and reliable AI systems, changing banking...  ...customer experiences. Our investments in technology infrastructure and world-class talent - along with our deep experience in machine... 
    Full time
    Part time
    Work at office
    Local area

    Capital One Financial Corp

    San Jose, CA
    6 days ago
  • $172.5k - $306.63k

     ...Staff Engineer - AI For Engineering Adobe empowers individuals and organizations to create exceptional content effortlessly. The AI for Engineering team builds a scalable, production-grade AI platform that powers creativity across design, imaging, motion, and personalization... 
    Temporary work
    Local area
    Worldwide

    Adobe

    San Jose, CA
    15 hours ago
  • Drive Capital is seeking a Senior Customer Support Engineer in Campbell, CA. This role involves responding to customer inquiries, managing technical operations, and building strong relationships with customers based on technical excellence. The ideal candidate will have... 

    Drive Capital

    Campbell, CA
    5 days ago
  • $229.9k - $262.4k

     ...Sr. Lead AI Engineer (Gen AI Platform Services) Overview: At Capital One, we are creating responsible and reliable AI systems,...  ...personalized customer experiences. Our investments in technology infrastructure and world-class talent - along with our deep experience in... 
    Full time
    Part time
    Local area

    Capital One

    San Jose, CA
    3 days ago
  • $262k - $365k

    Google Inc. seeks a Senior Staff Software Engineer for AI Infrastructure within Google Cloud. This role involves architecting high-performance, distributed infrastructure for agentic AI workflows, with responsibilities including system reliability and transitioning experimental... 

    Google Inc.

    Sunnyvale, CA
    4 days ago
  • $283.4k

    KLA is seeking a Sr. AI Infrastructure Software Engineer in Milpitas, California. This role focuses on C++ programming and involves designing core infrastructure for AI workloads. Join a top-notch team solving complex problems at the intersection of software and hardware... 

    Dormont Manufacturing Co

    Milpitas, CA
    2 days ago
  • $139.23k - $163.8k

     ...Lead Engineer (Generative AI) The Lead Engineer (Generative AI) is a senior technical role responsible for designing, developing, and...  ...-throughput, low-latency AI workloads Leverage modern infrastructure practices: Containerization (Docker) Orchestration... 
    Temporary work
    Work experience placement
    Local area
    3 days per week

    U.S. Bancorp

    Cupertino, CA
    15 hours ago
  • $356.5k

    NVIDIA Gruppe is seeking an experienced AI infrastructure software engineer to join its DGX Cloud AI Efficiency Team in Santa Clara, California. This role focuses on developing the infrastructure for optimizing AI workloads and ensuring high availability and efficiency... 

    NVIDIA Gruppe

    Santa Clara, CA
    6 days ago
  • A leading technology firm in California is seeking network engineers with hands-on experience in InfiniBand and Ethernet for managing high-performance computing (HPC) and artificial intelligence (AI) environments. Candidates should have advanced knowledge of networking... 

    TechDigital Group

    Santa Clara, CA
    3 days ago
  • $50 - $175 per hour

    Title: AI Infrastructure / ML Infrastructure Engineer Job Type: Contract Contract Length: 12 Months Pay Range: $50/hr - $175/hr Start Date: ASAP Location: Remote About the Opportunity Our client, a leader in AI testing, is looking for a skilled AI Infrastructure... 
    Contract work
    Immediate start
    Remote work

    DeWinter Group

    Campbell, CA
    5 days ago
  • NVIDIA Gruppe in Santa Clara seeks a Software Engineer to join the Managed AI Research Superclusters team. You'll design and operate cutting-edge infrastructure to enable AI research, collaborating with engineers to ensure reliability and scalability. The ideal candidate... 

    NVIDIA Gruppe

    Santa Clara, CA
    2 days ago
  • $181.1k - $318.4k

     ...for its Special Projects team in Cupertino, California. The role focuses on building innovative applications and robust infrastructure to support AI research. Candidates should excel in programming languages like Go or Swift and have experience with web services and containers... 

    Apple Inc.

    Cupertino, CA
    4 days ago
  • $293.6k - $335.1k

     ...we are creating responsible and reliable AI systems that transform banking for good....  ...customer experiences and scalable AI infrastructure to support groundbreaking products. Team...  ...leaders, product management, sales, and engineering to align platform capabilities with business... 
    Local area

    Hobbsnews

    San Jose, CA
    2 days ago
  • Joining NVIDIA's DGX Cloud AI Efficiency Team means contributing to the infrastructure that powers our innovative AI research. This team focuses on developing tools...  .... We are seeking an AI infrastructure software engineer to join our team. You'll be instrumental in... 

    NVIDIA Gruppe

    Santa Clara, CA
    6 days ago
  • $215.2k - $245.6k

     ...Lead AI Engineer (Gen AI Platform Services) Overview At Capital One, we are creating responsible and reliable AI systems, changing...  ...personalized customer experiences. Our investments in technology infrastructure and world‑class talent — along with our deep experience in... 
    Full time
    Part time
    Local area

    Capital One National Association

    San Jose, CA
    3 days ago
  • NVIDIA Gruppe in Santa Clara is looking for an experienced engineer to support our new supercomputers and AI technologies. You will lead collaboration across various teams and work closely with customers to understand their needs and develop tailored features. The ideal... 

    NVIDIA Gruppe

    Santa Clara, CA
    6 days ago
  • SPACE EXPLORATION TECHNOLOGIES CORP is looking for a Software Engineer to join their Platform Team in Sunnyvale, California. This role focuses on developing secure AI platforms that enhance code efficiency and data analysis capabilities across the company. The successful... 

    SPACE EXPLORATION TECHNOLOGIES CORP

    Sunnyvale, CA
    3 days ago
  • NVIDIA Corporation is seeking a Datacenter Product Engineer to join its Datacenter team in Santa Clara, California. This role focuses on launching AI supercomputing platforms and supporting GPU production. The ideal candidate will collaborate with NPI teams and implement... 

    NVIDIA Corporation

    Santa Clara, CA
    4 days ago
  • Apple Inc. is seeking a Senior Engineering Program Manager to lead complex projects across various teams in Cupertino, California. This role...  ...across stakeholders, and enhancing project outcomes using AI tools. The ideal candidate will have strong project management... 

    Apple Inc.

    Cupertino, CA
    5 days ago
  •  ...Services Limited is looking for a candidate in Santa Clara, California, who has strong technical sales experience in enterprise cloud and AI solutions. You will engage with customers and executive stakeholders, translating technical capabilities into business outcomes. The... 

    Tata Consultancy Services

    Santa Clara, CA
    5 days ago
  • $174k - $253k

    Google is seeking an Applied AI Customer Engineer in Sunnyvale, CA, offering a competitive salary ranging from $174,000 to $253,000 plus a bonus and equity. In this role, you will leverage your technical expertise to assist customers in adopting Conversational AI solutions... 

    Google

    Sunnyvale, CA
    2 days ago

Do you want to receive more vacancies?

Subscribe and receive similar vacancies to AI Infrastructure Engineer. Be the first to apply!