Sign up to access all features of our service.
  • Job search
  • Favorites
  • Create a CV
    New
  • Salaries
  • Subscriptions

AI Infrastructure Engineer

Advanced Micro Devices , Inc.

WHAT YOU DO AT AMD CHANGES EVERYTHING


At AMD, our mission is to build great products that accelerate next-generation computing experiences-from AI and data centers, to PCs, gaming and embedded systems. Grounded in a culture of innovation and collaboration, we believe real progress comes from bold ideas, human ingenuity and a shared passion to create something extraordinary. When you join AMD, you'll discover the real differentiator is our culture. We push the limits of innovation to solve the world's most important challenges-striving for execution excellence, while being direct, humble, collaborative, and inclusive of diverse perspectives. Join us as we shape the future of AI and beyond. Together, we advance your career.

THE PERSON:


We are seeking a DevOps / Platform Engineer to join our team building and operating large-scale GPU compute infrastructure that powers AI and ML workloads. The ideal candidate should be passionate about software engineering and possess leadership skills to independently deliver on multi-quarter projects. They should be able to caommunicate effectively and work optimally with their peers within our larger organization. Finally, you aren't afraid of a team in more of a startup mode at a larger company and willing to jump in to help in areas adjacent to your main project as needed.

Key Responsibilities
  • Build and extend platform capabilities to enable new classes of workloads (e.g., interactive development pods, CI pipelines, inference services, benchmarking jobs).
  • Design and operate scalable orchestration systems using Kubernetes across both on-prem and multi-cloud environments.
  • Develop platform features such as secret management, configuration management, and deployment automation for customers.
  • Partner with development teams to extend the GPU developer platform with features, APIs, templates, and self-service workflows that streamline job orchestration and environment management.
  • Manage service lifecycle within Kubernetes using Helm and GitOps workflows (e.g., ArgoCD or Flux).
  • Apply expertise in storage and networking to design and integrate CSI drivers, persistent volumes, and network policies that enable high-performance GPU workloads.
Required Qualifications
  • 5+ years of experience in DevOps, Platform, or Infrastructure Engineering.
  • Deep hands-on experience with Kubernetes and container orchestration at scale.
  • Proven ability to design and deliver platform features that serve internal customers or developer teams
  • Experience building developer-facing platforms or internal developer portals (e.g.custom workflow tooling).
Nice to Have
  • Hands-on experience in storage or network engineering within Kubernetes environments (e.g., CSI drivers, dynamic provisioning, CNI plugins, or network policy).
  • Experience with Infrastructure as Code tools like Terraform.
  • Background in HPC, Slurm , or GPU-based compute systems for ML/AI workloads.
  • Practical experience with monitoring and observability tools (Prometheus, Grafana, Loki, etc.).
  • Understanding of machine learning frameworks (PyTorch, vLLM, SGLang, etc.).

#LI-G11


#LI-HYBRID

Benefits offered are described: AMD benefits at a glance.

AMD does not accept unsolicited resumes from headhunters, recruitment agencies, or fee-based recruitment services. AMD and its subsidiaries are equal opportunity, inclusive employers and will consider all applicants without regard to age, ancestry, color, marital status, medical condition, mental or physical disability, national origin, race, religion, political and/or third-party affiliation, sex, pregnancy, sexual orientation, gender identity, military or veteran status, or any other characteristic protected by law. We encourage applications from all qualified candidates and will accommodate applicants' needs under the respective laws throughout all stages of the recruitment and selection process.

AMD may use Artificial Intelligence to help screen, assess or select applicants for this position. AMD's "Responsible AI Policy" is available here.

This posting is for an existing vacancy.
Vacancy posted 5 days ago
Similar jobs that could be interesting for youBased on the AI Infrastructure Engineer in San Jose, CA vacancy
  • $163.5k - $212.4k

     ...security, and dependability. Partner with engineering teams to understand real-world...  ...in software design and development for AI model training, and/or inference optimization...  ...application/project ~ Experience with cloud infrastructure and training (Azure, AWS, etc.) ~ CI/... 
    Suggested
    Full time
    Temporary work
    Flexible hours

    NIO

    San Jose, CA
    1 day ago
  • $94.16k - $141k

     ...are the essential building blocks of the data infrastructure that connects our world. Across enterprise, cloud and AI, and carrier architectures, our innovative technology...  ...Master's degree in Computer Science, Computer Engineering, Electrical Engineering, or a related field... 
    Suggested

    Marvell

    Santa Clara, CA
    1 hour ago
  • $124.09k - $210k

     ...Senior AI Data Infrastructure Engineer Santa Clara, CA XPENG is a leading smart technology company at the forefront of innovation, integrating advanced AI and autonomous driving technologies into its vehicles, including electric vehicles (EVs), electric vertical... 
    Suggested
    Full time
    Work experience placement

    XPENG

    Santa Clara, CA
    18 hours ago
  • $160k - $240k

     ...AI Cloud Infrastructure Engineer - Fury Team Sunnyvale, CA The future of defense will be decided by those who field intelligent machines at scale. At Scout AI, we're developing Fury, the first robotic foundation model for defense, to give U.S. forces overwhelming... 
    Suggested
    Full time
    Relocation package

    Scout AI

    Sunnyvale, CA
    4 days ago
  • $136.5k - $253.5k

    Cadence is seeking a highly skilled AI Systems Engineer to join their team in San Jose, CA. This hands-on, senior role will lead the AI infrastructure development, including architecting high-performance GPU clusters and deploying advanced AI models. Ideal candidates will... 
    Suggested

    Cadence

    San Jose, CA
    3 days ago
  • $174k - $252k

    Google Inc. is looking for a Senior Software Engineer in Sunnyvale, CA, to join the AI and Infrastructure team. The role involves developing next-generation technologies, managing project priorities, and working on critical projects that impact billions of users. Candidates... 

    Google Inc.

    Sunnyvale, CA
    4 days ago
  • $191k - $315k

     ...Senior Staff AI Engineer, Network Growth AI LinkedIn is the world's largest professional network, built to create economic opportunity...  ...discipline. Prior experience with large scale ML data infrastructure Experience with developing and designing production scale recommender... 
    For contractors
    Work at office
    Flexible hours

    LinkedIn

    Sunnyvale, CA
    2 days ago
  • $174k - $252k

    A leading tech company is seeking a Senior Software Engineer for AI and Infrastructure. This position involves writing and testing software, participating in design reviews, and maintaining coding best practices. Candidates should have at least 5 years of programming experience... 

    Google Inc.

    Sunnyvale, CA
    3 days ago
  • $163.5k - $212.4k

    NIO is seeking a Senior AI Inference Infrastructure Software Engineer in San Jose, CA, specializing in building scalable inference systems for large language and vision-language models. This role requires over 5 years of software development experience and strong skills... 

    nio.com

    San Jose, CA
    1 day ago
  • $163.5k - $212.4k

     ...flagship sedan, and the ET5, a mid-size smart electric sedan.**About the Position**We are looking for a senior AI Inference Infrastructure Software Engineer with strong hands-on experience building, optimizing, and deploying high-performance, scalable inference systems... 
    Full time
    Temporary work
    Immediate start
    Flexible hours

    nio.com

    San Jose, CA
    1 day ago
  • $184k - $287.5k

     ...Joining NVIDIA's DGX Cloud AI Efficiency Team means contributing to the infrastructure that powers our innovative AI research. This team focuses on developing...  ...innovation. We are seeking an AI infrastructure software engineer to join our team. You'll be instrumental in... 

    NVIDIA

    Santa Clara, CA
    4 days ago
  • $314.8k - $359.3k

     ...Sr. Distinguished AI Engineer (Agentic AI Platform) Overview: At Capital One, we are creating responsible and reliable AI systems...  ...customer experiences. Our investments in technology infrastructure and world-class talent - along with our deep experience in machine... 
    Full time
    Part time
    Work at office
    Local area

    Capital One

    San Jose, CA
    1 day ago
  •  ...Cerebras Systems builds the world's largest AI chip, 56 times larger than GPUs. Our novel wafer-scale architecture provides...  ...We are seeking a highly skilled and experienced AI Infrastructure Operations Engineer to manage and operate our cutting-edge machine learning compute... 

    CEREBRAS SYSTEMS INC.

    Sunnyvale, CA
    18 hours ago
  • Google Inc. is seeking a Software Engineering Manager in San Jose, CA to lead a team focused on AI and machine learning infrastructure, optimizing technical leadership across major projects. This role requires at least 8 years of experience with distributed systems and... 

    Google Inc.

    San Jose, CA
    18 hours ago
  • $91.7k - $158.82k

     ...healthy, fulfilling life in and outside of work. Your Mission: We are seeking a highly motivated and talented AI Infrastructure & Platform Ops Engineer to join our team. In this role you will have the opportunity to work on cutting-edge AI technologies and... 
    Full time
    Temporary work
    Work experience placement
    Work at office
    Remote work
    Relocation
    Flexible hours
    Shift work
    3 days per week

    Lockheed Martin Corporation

    Sunnyvale, CA
    3 days ago
  • $168k - $270.25k

     ...looking to hire a deeply technical, creative, and Senior AI Platform Engineer to build, support, and maintain the next generation of AI-...  ...What you will be doing: Define and lead AI-native infrastructure roadmaps and cross-organizational initiatives. Architect... 

    NVIDIA

    Santa Clara, CA
    4 days ago
  • A leading technology company in Santa Clara is seeking a Principal Backend Infrastructure Engineer to drive technical strategies for scalable search platforms. In this role, you will collaborate with backend engineers and machine learning researchers to enhance the functionality... 

    Apple Inc.

    Santa Clara, CA
    2 days ago
  • $229.9k - $262.4k

     ...Sr. Lead AI Engineer (GenAI Platform) Overview: At Capital One, we are creating responsible and reliable AI systems, changing...  ...personalized customer experiences. Our investments in technology infrastructure and world-class talent - along with our deep experience in... 
    Full time
    Part time
    Local area

    Capital One

    San Jose, CA
    1 day ago
  • $215.2k - $245.6k

     ...Lead AI Engineer (Gen AI Platform Services) Overview At Capital One, we are creating responsible and reliable AI systems, changing...  ...customer experiences. Our investments in technology infrastructure and world-class talent - along with our deep experience in machine... 
    Full time
    Part time
    Local area

    Capital One

    San Jose, CA
    1 day ago
  • $123.2k - $189.1k

     ...releases by turning failures into actionable engineering insights at scale. This is a...  ...vehicle hardware and compute -not cloud infrastructure or hardware design. The mission of...  ...intelligent triage, deep software debugging, and AI-assisted failure analysis across... 
    Local area
    Work from home
    Flexible hours

    General Motors

    Sunnyvale, CA
    2 days ago
  •  ...Title: Prinicipal AI Engineer Location: Sunnyvale, California. Duration: 6 to 12+ Months Job Description:...  ...Proficient PostgreSQL, embeddings/vector search Cloud/Infrastructure Proficient GCP (BigQuery, GCS), Kubernetes, CCM Integration... 
    Contract work

    Redolent

    Sunnyvale, CA
    4 days ago
  •  ...Kai is the AI company rebuilding cybersecurity for the machine-speed era. Founded...  ...class leadership team: Our Heads of AI, Engineering, and Product bring extensive experience...  ...Engineer to drive the security of the Azure infrastructure that powers the Kai AI-native... 

    Kai Cyber, Inc.

    San Jose, CA
    3 days ago
  • Job Title Bachelor’s degree with 8+ years of experience, or Master’s degree with 6+ years in CS, EE, IT, or related field. 7+ years of hands-on experience in firmware or embedded software development. Strong proficiency in C and/or C++ for embedded systems. ...

    Saxon Global

    Santa Clara, CA
    18 hours ago
  • $229.9k - $262.4k

    Senior Lead AI Engineer (Gen AI Platform Services, Agentic AI) Overview: At Capital One, we are creating responsible...  ...customer experiences. Our investments in technology infrastructure and world-class talent - along with our deep experience in machine... 
    Full time
    Part time
    Local area

    Capital One Financial Corp

    San Jose, CA
    2 days ago
  • $128.4k - $172.3k

     ...position. Meet the Team Join Cisco's Enterprise AI team, the core group enabling Generative AI powered...  ...We operate at the intersection of applied AI, cloud infrastructure and security -partnering across engineering, security, compliance, and product teams to bring... 
    Full time
    Temporary work
    Local area
    Flexible hours

    Cisco

    San Jose, CA
    1 day ago
  • $141k - $202k

    A leading technology firm in Sunnyvale, CA is seeking a Software Engineer to improve compiler integration for TPUs and other accelerators. Ideal candidates hold a Bachelor's degree and possess strong skills in C++ and distributed systems. You'll write and test code, develop... 

    Google Inc.

    Sunnyvale, CA
    1 day ago
  • $165k - $188k

    A leading IT solutions company in San Jose seeks a Sr. Software Engineer to develop AI/ML infrastructure software. This role requires strong proficiency in Python and expertise in the Nvidia AI/LLM stack. Responsibilities include deploying LLM applications, optimizing... 
    Work at office

    Victrays

    San Jose, CA
    4 days ago
  • $181.1k - $318.4k

     ...for its Special Projects team in Cupertino, California. The role focuses on building innovative applications and robust infrastructure to support AI research. Candidates should excel in programming languages like Go or Swift and have experience with web services and containers... 

    Apple Inc.

    Cupertino, CA
    1 day ago
  • A leading technology firm in California is seeking network engineers with hands-on experience in InfiniBand and Ethernet for managing high-performance computing (HPC) and artificial intelligence (AI) environments. Candidates should have advanced knowledge of networking... 

    TechDigital Group

    Santa Clara, CA
    18 hours ago
  •  ...Senior AI/ML DevOps Engineer Join Cisco's CX AI Incubation Team as a Senior AI/ML DevOps Engineer and help productionize LLM/SLM capabilities...  ..., powering delivery intelligence, network automation, infrastructure testing, and intelligence on edge. You will collaborate... 

    Webex Events (formerly Socio)

    San Jose, CA
    2 days ago

Do you want to receive more vacancies?

Subscribe and receive similar vacancies to AI Infrastructure Engineer. Be the first to apply!