AI Infrastructure Engineer

Advanced Micro Devices , Inc.

WHAT YOU DO AT AMD CHANGES EVERYTHING

At AMD, our mission is to build great products that accelerate next-generation computing experiences-from AI and data centers, to PCs, gaming and embedded systems. Grounded in a culture of innovation and collaboration, we believe real progress comes from bold ideas, human ingenuity and a shared passion to create something extraordinary. When you join AMD, you'll discover the real differentiator is our culture. We push the limits of innovation to solve the world's most important challenges-striving for execution excellence, while being direct, humble, collaborative, and inclusive of diverse perspectives. Join us as we shape the future of AI and beyond. Together, we advance your career.

THE PERSON:

We are seeking a DevOps / Platform Engineer to join our team building and operating large-scale GPU compute infrastructure that powers AI and ML workloads. The ideal candidate should be passionate about software engineering and possess leadership skills to independently deliver on multi-quarter projects. They should be able to caommunicate effectively and work optimally with their peers within our larger organization. Finally, you aren't afraid of a team in more of a startup mode at a larger company and willing to jump in to help in areas adjacent to your main project as needed.

Key Responsibilities

Build and extend platform capabilities to enable new classes of workloads (e.g., interactive development pods, CI pipelines, inference services, benchmarking jobs).
Design and operate scalable orchestration systems using Kubernetes across both on-prem and multi-cloud environments.
Develop platform features such as secret management, configuration management, and deployment automation for customers.
Partner with development teams to extend the GPU developer platform with features, APIs, templates, and self-service workflows that streamline job orchestration and environment management.
Manage service lifecycle within Kubernetes using Helm and GitOps workflows (e.g., ArgoCD or Flux).
Apply expertise in storage and networking to design and integrate CSI drivers, persistent volumes, and network policies that enable high-performance GPU workloads.

Required Qualifications

5+ years of experience in DevOps, Platform, or Infrastructure Engineering.
Deep hands-on experience with Kubernetes and container orchestration at scale.
Proven ability to design and deliver platform features that serve internal customers or developer teams
Experience building developer-facing platforms or internal developer portals (e.g.custom workflow tooling).

Nice to Have

Hands-on experience in storage or network engineering within Kubernetes environments (e.g., CSI drivers, dynamic provisioning, CNI plugins, or network policy).
Experience with Infrastructure as Code tools like Terraform.
Background in HPC, Slurm , or GPU-based compute systems for ML/AI workloads.
Practical experience with monitoring and observability tools (Prometheus, Grafana, Loki, etc.).
Understanding of machine learning frameworks (PyTorch, vLLM, SGLang, etc.).

#LI-G11

#LI-HYBRID

Benefits offered are described: AMD benefits at a glance.

AMD does not accept unsolicited resumes from headhunters, recruitment agencies, or fee-based recruitment services. AMD and its subsidiaries are equal opportunity, inclusive employers and will consider all applicants without regard to age, ancestry, color, marital status, medical condition, mental or physical disability, national origin, race, religion, political and/or third-party affiliation, sex, pregnancy, sexual orientation, gender identity, military or veteran status, or any other characteristic protected by law. We encourage applications from all qualified candidates and will accommodate applicants' needs under the respective laws throughout all stages of the recruitment and selection process.

AMD may use Artificial Intelligence to help screen, assess or select applicants for this position. AMD's "Responsible AI Policy" is available here.

This posting is for an existing vacancy.

Apply

Vacancy posted 3 days ago

Similar jobs that could be interesting for youBased on the AI Infrastructure Engineer in San Jose, CA vacancy

AI Infrastructure Engineer
$94.5k - $212.5k
...That's why we continuously invest in innovative ideas, such as AI-enabled insights and technology-powered solutions, to... ...future of our industry. About the Role As an AI Infrastructure Engineer, you are a deep technical contributor with subsystem ownership...
Suggested
Local area
Worldwide
Crowe
San Jose, CA
2 days ago
AI Infrastructure Engineer
$192.1k - $249.6k
...Senior AI Inference Infrastructure Software Engineer NIO is a pioneer and a leading company in the premium smart electric vehicle market. Founded in November 2014, NIO's mission is to shape a joyful lifestyle. NIO aims to build a community starting with smart electric...
Suggested
Full time
Temporary work
Immediate start
Flexible hours
NIO
San Jose, CA
1 day ago
Senior AI Platform Engineer
$172.5k - $306.63k
...The Opportunity Adobe empowers individuals and organizations to create exceptional content effortlessly. The AI for Engineering team builds a scalable, production‑grade AI platform that powers creativity across design, imaging, motion, and personalization. We are seeking...
Suggested
Local area
Dormont Manufacturing Company
San Jose, CA
2 days ago
Senior AI Infrastructure Engineer, Large-Scale GPU Clusters
...Corporation in Santa Clara is seeking a Senior Software Engineer to lead the optimization of large-scale AI systems. This role will involve profiling and... ...will have over 8 years of experience in software infrastructure for AI systems, with expert-level programming in Python...
Suggested
NVIDIA Corporation
Santa Clara, CA
4 days ago
Senior AI/RL Infrastructure Engineer
$126k - $423k
Decisive Point is seeking a Research Engineer (AI/RL Infrastructure) in Sunnyvale, California to design and operate large-scale ML systems. You will collaborate with leading experts and contribute to next-generation physical AI, impacting self-driving technologies. This...
Suggested
Decisive Point
Sunnyvale, CA
4 days ago
AI Platform Engineer
$168k - $270.25k
...looking to hire a deeply technical, creative, and Senior AI Platform Engineer to build, support, and maintain the next generation of AI-... ...What you will be doing: Define and lead AI-native infrastructure roadmaps and cross-organizational initiatives. Architect...
NVIDIA
Santa Clara, CA
1 day ago
AI Platform Engineer, Training and Inference
...AI Platform Engineer - Training & Inference Saviynt's AI-powered identity platform manages and governs human and non-human access to... ...between self-hosted SLMs and cloud LLMs • Build RL training infrastructure: define Flyte workflows for RL pipelines (rollout, reward...
Saviynt
Milpitas, CA
5 days ago
Sr. Distinguished AI Engineer (Agentic AI Platform)
$314.8k - $359.3k
...Sr. Distinguished AI Engineer (Agentic AI Platform) Overview: At Capital One, we are creating responsible and reliable AI systems... ...customer experiences. Our investments in technology infrastructure and world-class talent - along with our deep experience in machine...
Full time
Part time
Work at office
Local area
Capital One
San Jose, CA
2 days ago
Senior Lead AI Engineer (Gen AI Platform Services, Agentic AI)
...ideas even when they are unproven.* You are deeply Technical. You possess a strong foundation in engineering and mathematics, and your expertise in hardware, software, and AI enable you to see and exploit optimization opportunities that others miss.* You are a resilient...
Full time
Part time
Capital One
San Jose, CA
3 days ago
Distinguished AI Engineer (Agentic AI Platform)
$269.1k - $307.2k
...Distinguished AI Engineer (Agentic AI Platform) At Capital One, we are creating responsible and reliable AI systems, changing banking... ...customer experiences. Our investments in technology infrastructure and world-class talent - along with our deep experience in machine...
Full time
Part time
Work at office
Local area
Capital One Financial Corp
San Jose, CA
6 days ago
Senior AI Platform Engineer
$172.5k - $306.63k
...Staff Engineer - AI For Engineering Adobe empowers individuals and organizations to create exceptional content effortlessly. The AI for Engineering team builds a scalable, production-grade AI platform that powers creativity across design, imaging, motion, and personalization...
Temporary work
Local area
Worldwide
Adobe
San Jose, CA
15 hours ago
Senior AI Infrastructure Support Engineer
Drive Capital is seeking a Senior Customer Support Engineer in Campbell, CA. This role involves responding to customer inquiries, managing technical operations, and building strong relationships with customers based on technical excellence. The ideal candidate will have...
Drive Capital
Campbell, CA
5 days ago
Sr. Lead AI Engineer (Gen AI Platform Services)
$229.9k - $262.4k
...Sr. Lead AI Engineer (Gen AI Platform Services) Overview: At Capital One, we are creating responsible and reliable AI systems,... ...personalized customer experiences. Our investments in technology infrastructure and world-class talent - along with our deep experience in...
Full time
Part time
Local area
Capital One
San Jose, CA
3 days ago
Senior Staff AI Infrastructure Engineer
$262k - $365k
Google Inc. seeks a Senior Staff Software Engineer for AI Infrastructure within Google Cloud. This role involves architecting high-performance, distributed infrastructure for agentic AI workflows, with responsibilities including system reliability and transitioning experimental...
Google Inc.
Sunnyvale, CA
4 days ago
Senior AI Infrastructure Engineer (C++/GPU)
$283.4k
KLA is seeking a Sr. AI Infrastructure Software Engineer in Milpitas, California. This role focuses on C++ programming and involves designing core infrastructure for AI workloads. Join a top-notch team solving complex problems at the intersection of software and hardware...
Dormont Manufacturing Co
Milpitas, CA
2 days ago
Lead AI Platform Engineer
$139.23k - $163.8k
...Lead Engineer (Generative AI) The Lead Engineer (Generative AI) is a senior technical role responsible for designing, developing, and... ...-throughput, low-latency AI workloads Leverage modern infrastructure practices: Containerization (Docker) Orchestration...
Temporary work
Work experience placement
Local area
3 days per week
U.S. Bancorp
Cupertino, CA
15 hours ago
Senior AI Infra Engineer - Large-Scale DGX Cloud (Equity)
$356.5k
NVIDIA Gruppe is seeking an experienced AI infrastructure software engineer to join its DGX Cloud AI Efficiency Team in Santa Clara, California. This role focuses on developing the infrastructure for optimizing AI workloads and ensuring high availability and efficiency...
NVIDIA Gruppe
Santa Clara, CA
6 days ago
HPC/AI Network Engineer - InfiniBand, VXLAN & EVPN Expert
A leading technology firm in California is seeking network engineers with hands-on experience in InfiniBand and Ethernet for managing high-performance computing (HPC) and artificial intelligence (AI) environments. Candidates should have advanced knowledge of networking...
TechDigital Group
Santa Clara, CA
3 days ago
AI Infrastructure / ML Infrastructure Engineer
$50 - $175 per hour
Title: AI Infrastructure / ML Infrastructure Engineer Job Type: Contract Contract Length: 12 Months Pay Range: $50/hr - $175/hr Start Date: ASAP Location: Remote About the Opportunity Our client, a leader in AI testing, is looking for a skilled AI Infrastructure...
Contract work
Immediate start
Remote work
DeWinter Group
Campbell, CA
5 days ago
Senior AI Infrastructure & Distributed Systems Engineer
NVIDIA Gruppe in Santa Clara seeks a Software Engineer to join the Managed AI Research Superclusters team. You'll design and operate cutting-edge infrastructure to enable AI research, collaborating with engineers to ensure reliability and scalability. The ideal candidate...
NVIDIA Gruppe
Santa Clara, CA
2 days ago
Senior AI Infrastructure Engineer - Special Projects
$181.1k - $318.4k
...for its Special Projects team in Cupertino, California. The role focuses on building innovative applications and robust infrastructure to support AI research. Candidates should excel in programming languages like Go or Swift and have experience with web services and containers...
Apple Inc.
Cupertino, CA
4 days ago
Director, AI Engineering (Agentic AI Platform)
$293.6k - $335.1k
...we are creating responsible and reliable AI systems that transform banking for good.... ...customer experiences and scalable AI infrastructure to support groundbreaking products. Team... ...leaders, product management, sales, and engineering to align platform capabilities with business...
Local area
Hobbsnews
San Jose, CA
2 days ago
Senior DGX Cloud AI Infrastructure Software Engineer
Joining NVIDIA's DGX Cloud AI Efficiency Team means contributing to the infrastructure that powers our innovative AI research. This team focuses on developing tools... .... We are seeking an AI infrastructure software engineer to join our team. You'll be instrumental in...
NVIDIA Gruppe
Santa Clara, CA
6 days ago
Lead AI Engineer (Gen AI Platform Services)
$215.2k - $245.6k
...Lead AI Engineer (Gen AI Platform Services) Overview At Capital One, we are creating responsible and reliable AI systems, changing... ...personalized customer experiences. Our investments in technology infrastructure and world‑class talent — along with our deep experience in...
Full time
Part time
Local area
Capital One National Association
San Jose, CA
3 days ago
Senior PCIe Networking & AI Fabric Solutions Engineer
NVIDIA Gruppe in Santa Clara is looking for an experienced engineer to support our new supercomputers and AI technologies. You will lead collaboration across various teams and work closely with customers to understand their needs and develop tailored features. The ideal...
NVIDIA Gruppe
Santa Clara, CA
6 days ago
AI Platform Engineer - Secure, Scalable Gateways & Proxies
SPACE EXPLORATION TECHNOLOGIES CORP is looking for a Software Engineer to join their Platform Team in Sunnyvale, California. This role focuses on developing secure AI platforms that enhance code efficiency and data analysis capabilities across the company. The successful...
SPACE EXPLORATION TECHNOLOGIES CORP
Sunnyvale, CA
3 days ago
Senior Datacenter AI Platform Engineer - Equity Options
NVIDIA Corporation is seeking a Datacenter Product Engineer to join its Datacenter team in Santa Clara, California. This role focuses on launching AI supercomputing platforms and supporting GPU production. The ideal candidate will collaborate with NPI teams and implement...
NVIDIA Corporation
Santa Clara, CA
4 days ago
AI-Driven Engineering PM — Services Platform
Apple Inc. is seeking a Senior Engineering Program Manager to lead complex projects across various teams in Cupertino, California. This role... ...across stakeholders, and enhancing project outcomes using AI tools. The ideal candidate will have strong project management...
Apple Inc.
Cupertino, CA
5 days ago
Enterprise Cloud & AI Solution Engineer (Sales)
...Services Limited is looking for a candidate in Santa Clara, California, who has strong technical sales experience in enterprise cloud and AI solutions. You will engage with customers and executive stakeholders, translating technical capabilities into business outcomes. The...
Tata Consultancy Services
Santa Clara, CA
5 days ago
Senior Applied AI Cloud Solutions Engineer
$174k - $253k
Google is seeking an Applied AI Customer Engineer in Sunnyvale, CA, offering a competitive salary ranging from $174,000 to $253,000 plus a bonus and equity. In this role, you will leverage your technical expertise to assist customers in adopting Conversational AI solutions...
Google
Sunnyvale, CA
2 days ago

Do you want to receive more vacancies?

Subscribe and receive similar vacancies to AI Infrastructure Engineer. Be the first to apply!