AI Infrastructure Engineer
Advanced Micro Devices , Inc.
WHAT YOU DO AT AMD CHANGES EVERYTHING
At AMD, our mission is to build great products that accelerate next-generation computing experiences-from AI and data centers, to PCs, gaming and embedded systems. Grounded in a culture of innovation and collaboration, we believe real progress comes from bold ideas, human ingenuity and a shared passion to create something extraordinary. When you join AMD, you'll discover the real differentiator is our culture. We push the limits of innovation to solve the world's most important challenges-striving for execution excellence, while being direct, humble, collaborative, and inclusive of diverse perspectives. Join us as we shape the future of AI and beyond. Together, we advance your career. THE PERSON:
We are seeking a DevOps / Platform Engineer to join our team building and operating large-scale GPU compute infrastructure that powers AI and ML workloads. The ideal candidate should be passionate about software engineering and possess leadership skills to independently deliver on multi-quarter projects. They should be able to caommunicate effectively and work optimally with their peers within our larger organization. Finally, you aren't afraid of a team in more of a startup mode at a larger company and willing to jump in to help in areas adjacent to your main project as needed. Key Responsibilities
#LI-HYBRID Benefits offered are described: AMD benefits at a glance. AMD does not accept unsolicited resumes from headhunters, recruitment agencies, or fee-based recruitment services. AMD and its subsidiaries are equal opportunity, inclusive employers and will consider all applicants without regard to age, ancestry, color, marital status, medical condition, mental or physical disability, national origin, race, religion, political and/or third-party affiliation, sex, pregnancy, sexual orientation, gender identity, military or veteran status, or any other characteristic protected by law. We encourage applications from all qualified candidates and will accommodate applicants' needs under the respective laws throughout all stages of the recruitment and selection process. AMD may use Artificial Intelligence to help screen, assess or select applicants for this position. AMD's "Responsible AI Policy" is available here. This posting is for an existing vacancy.
At AMD, our mission is to build great products that accelerate next-generation computing experiences-from AI and data centers, to PCs, gaming and embedded systems. Grounded in a culture of innovation and collaboration, we believe real progress comes from bold ideas, human ingenuity and a shared passion to create something extraordinary. When you join AMD, you'll discover the real differentiator is our culture. We push the limits of innovation to solve the world's most important challenges-striving for execution excellence, while being direct, humble, collaborative, and inclusive of diverse perspectives. Join us as we shape the future of AI and beyond. Together, we advance your career. THE PERSON:
We are seeking a DevOps / Platform Engineer to join our team building and operating large-scale GPU compute infrastructure that powers AI and ML workloads. The ideal candidate should be passionate about software engineering and possess leadership skills to independently deliver on multi-quarter projects. They should be able to caommunicate effectively and work optimally with their peers within our larger organization. Finally, you aren't afraid of a team in more of a startup mode at a larger company and willing to jump in to help in areas adjacent to your main project as needed. Key Responsibilities
- Build and extend platform capabilities to enable new classes of workloads (e.g., interactive development pods, CI pipelines, inference services, benchmarking jobs).
- Design and operate scalable orchestration systems using Kubernetes across both on-prem and multi-cloud environments.
- Develop platform features such as secret management, configuration management, and deployment automation for customers.
- Partner with development teams to extend the GPU developer platform with features, APIs, templates, and self-service workflows that streamline job orchestration and environment management.
- Manage service lifecycle within Kubernetes using Helm and GitOps workflows (e.g., ArgoCD or Flux).
- Apply expertise in storage and networking to design and integrate CSI drivers, persistent volumes, and network policies that enable high-performance GPU workloads.
- 5+ years of experience in DevOps, Platform, or Infrastructure Engineering.
- Deep hands-on experience with Kubernetes and container orchestration at scale.
- Proven ability to design and deliver platform features that serve internal customers or developer teams
- Experience building developer-facing platforms or internal developer portals (e.g.custom workflow tooling).
- Hands-on experience in storage or network engineering within Kubernetes environments (e.g., CSI drivers, dynamic provisioning, CNI plugins, or network policy).
- Experience with Infrastructure as Code tools like Terraform.
- Background in HPC, Slurm , or GPU-based compute systems for ML/AI workloads.
- Practical experience with monitoring and observability tools (Prometheus, Grafana, Loki, etc.).
- Understanding of machine learning frameworks (PyTorch, vLLM, SGLang, etc.).
#LI-HYBRID Benefits offered are described: AMD benefits at a glance. AMD does not accept unsolicited resumes from headhunters, recruitment agencies, or fee-based recruitment services. AMD and its subsidiaries are equal opportunity, inclusive employers and will consider all applicants without regard to age, ancestry, color, marital status, medical condition, mental or physical disability, national origin, race, religion, political and/or third-party affiliation, sex, pregnancy, sexual orientation, gender identity, military or veteran status, or any other characteristic protected by law. We encourage applications from all qualified candidates and will accommodate applicants' needs under the respective laws throughout all stages of the recruitment and selection process. AMD may use Artificial Intelligence to help screen, assess or select applicants for this position. AMD's "Responsible AI Policy" is available here. This posting is for an existing vacancy.
Vacancy posted 3 days ago
Similar jobs that could be interesting for youBased on the AI Infrastructure Engineer in San Jose, CA vacancy
$94.5k - $212.5k
...That's why we continuously invest in innovative ideas, such as AI-enabled insights and technology-powered solutions, to... ...future of our industry. About the Role As an AI Infrastructure Engineer, you are a deep technical contributor with subsystem ownership...SuggestedLocal areaWorldwide$192.1k - $249.6k
...Senior AI Inference Infrastructure Software Engineer NIO is a pioneer and a leading company in the premium smart electric vehicle market. Founded in November 2014, NIO's mission is to shape a joyful lifestyle. NIO aims to build a community starting with smart electric...SuggestedFull timeTemporary workImmediate startFlexible hours$172.5k - $306.63k
...The Opportunity Adobe empowers individuals and organizations to create exceptional content effortlessly. The AI for Engineering team builds a scalable, production‑grade AI platform that powers creativity across design, imaging, motion, and personalization. We are seeking...SuggestedLocal area- ...Corporation in Santa Clara is seeking a Senior Software Engineer to lead the optimization of large-scale AI systems. This role will involve profiling and... ...will have over 8 years of experience in software infrastructure for AI systems, with expert-level programming in Python...Suggested
$126k - $423k
Decisive Point is seeking a Research Engineer (AI/RL Infrastructure) in Sunnyvale, California to design and operate large-scale ML systems. You will collaborate with leading experts and contribute to next-generation physical AI, impacting self-driving technologies. This...Suggested$168k - $270.25k
...looking to hire a deeply technical, creative, and Senior AI Platform Engineer to build, support, and maintain the next generation of AI-... ...What you will be doing: Define and lead AI-native infrastructure roadmaps and cross-organizational initiatives. Architect...- ...AI Platform Engineer - Training & Inference Saviynt's AI-powered identity platform manages and governs human and non-human access to... ...between self-hosted SLMs and cloud LLMs • Build RL training infrastructure: define Flyte workflows for RL pipelines (rollout, reward...
$314.8k - $359.3k
...Sr. Distinguished AI Engineer (Agentic AI Platform) Overview: At Capital One, we are creating responsible and reliable AI systems... ...customer experiences. Our investments in technology infrastructure and world-class talent - along with our deep experience in machine...Full timePart timeWork at officeLocal area- ...ideas even when they are unproven.* You are deeply Technical. You possess a strong foundation in engineering and mathematics, and your expertise in hardware, software, and AI enable you to see and exploit optimization opportunities that others miss.* You are a resilient...Full timePart time
$269.1k - $307.2k
...Distinguished AI Engineer (Agentic AI Platform) At Capital One, we are creating responsible and reliable AI systems, changing banking... ...customer experiences. Our investments in technology infrastructure and world-class talent - along with our deep experience in machine...Full timePart timeWork at officeLocal area$172.5k - $306.63k
...Staff Engineer - AI For Engineering Adobe empowers individuals and organizations to create exceptional content effortlessly. The AI for Engineering team builds a scalable, production-grade AI platform that powers creativity across design, imaging, motion, and personalization...Temporary workLocal areaWorldwide- Drive Capital is seeking a Senior Customer Support Engineer in Campbell, CA. This role involves responding to customer inquiries, managing technical operations, and building strong relationships with customers based on technical excellence. The ideal candidate will have...
$229.9k - $262.4k
...Sr. Lead AI Engineer (Gen AI Platform Services) Overview: At Capital One, we are creating responsible and reliable AI systems,... ...personalized customer experiences. Our investments in technology infrastructure and world-class talent - along with our deep experience in...Full timePart timeLocal area$262k - $365k
Google Inc. seeks a Senior Staff Software Engineer for AI Infrastructure within Google Cloud. This role involves architecting high-performance, distributed infrastructure for agentic AI workflows, with responsibilities including system reliability and transitioning experimental...$283.4k
KLA is seeking a Sr. AI Infrastructure Software Engineer in Milpitas, California. This role focuses on C++ programming and involves designing core infrastructure for AI workloads. Join a top-notch team solving complex problems at the intersection of software and hardware...$139.23k - $163.8k
...Lead Engineer (Generative AI) The Lead Engineer (Generative AI) is a senior technical role responsible for designing, developing, and... ...-throughput, low-latency AI workloads Leverage modern infrastructure practices: Containerization (Docker) Orchestration...Temporary workWork experience placementLocal area3 days per week$356.5k
NVIDIA Gruppe is seeking an experienced AI infrastructure software engineer to join its DGX Cloud AI Efficiency Team in Santa Clara, California. This role focuses on developing the infrastructure for optimizing AI workloads and ensuring high availability and efficiency...- A leading technology firm in California is seeking network engineers with hands-on experience in InfiniBand and Ethernet for managing high-performance computing (HPC) and artificial intelligence (AI) environments. Candidates should have advanced knowledge of networking...
$50 - $175 per hour
Title: AI Infrastructure / ML Infrastructure Engineer Job Type: Contract Contract Length: 12 Months Pay Range: $50/hr - $175/hr Start Date: ASAP Location: Remote About the Opportunity Our client, a leader in AI testing, is looking for a skilled AI Infrastructure...Contract workImmediate startRemote work- NVIDIA Gruppe in Santa Clara seeks a Software Engineer to join the Managed AI Research Superclusters team. You'll design and operate cutting-edge infrastructure to enable AI research, collaborating with engineers to ensure reliability and scalability. The ideal candidate...
$181.1k - $318.4k
...for its Special Projects team in Cupertino, California. The role focuses on building innovative applications and robust infrastructure to support AI research. Candidates should excel in programming languages like Go or Swift and have experience with web services and containers...$293.6k - $335.1k
...we are creating responsible and reliable AI systems that transform banking for good.... ...customer experiences and scalable AI infrastructure to support groundbreaking products. Team... ...leaders, product management, sales, and engineering to align platform capabilities with business...Local area- Joining NVIDIA's DGX Cloud AI Efficiency Team means contributing to the infrastructure that powers our innovative AI research. This team focuses on developing tools... .... We are seeking an AI infrastructure software engineer to join our team. You'll be instrumental in...
$215.2k - $245.6k
...Lead AI Engineer (Gen AI Platform Services) Overview At Capital One, we are creating responsible and reliable AI systems, changing... ...personalized customer experiences. Our investments in technology infrastructure and world‑class talent — along with our deep experience in...Full timePart timeLocal area- NVIDIA Gruppe in Santa Clara is looking for an experienced engineer to support our new supercomputers and AI technologies. You will lead collaboration across various teams and work closely with customers to understand their needs and develop tailored features. The ideal...
- SPACE EXPLORATION TECHNOLOGIES CORP is looking for a Software Engineer to join their Platform Team in Sunnyvale, California. This role focuses on developing secure AI platforms that enhance code efficiency and data analysis capabilities across the company. The successful...
- NVIDIA Corporation is seeking a Datacenter Product Engineer to join its Datacenter team in Santa Clara, California. This role focuses on launching AI supercomputing platforms and supporting GPU production. The ideal candidate will collaborate with NPI teams and implement...
- Apple Inc. is seeking a Senior Engineering Program Manager to lead complex projects across various teams in Cupertino, California. This role... ...across stakeholders, and enhancing project outcomes using AI tools. The ideal candidate will have strong project management...
- ...Services Limited is looking for a candidate in Santa Clara, California, who has strong technical sales experience in enterprise cloud and AI solutions. You will engage with customers and executive stakeholders, translating technical capabilities into business outcomes. The...
$174k - $253k
Google is seeking an Applied AI Customer Engineer in Sunnyvale, CA, offering a competitive salary ranging from $174,000 to $253,000 plus a bonus and equity. In this role, you will leverage your technical expertise to assist customers in adopting Conversational AI solutions...
Do you want to receive more vacancies?
Subscribe and receive similar vacancies to AI Infrastructure Engineer. Be the first to apply!
Related searches
- ai engineer San Jose, CA
- machine learning ai engineer San Jose, CA
- ai ml engineer San Jose, CA
- senior ai engineer San Jose, CA
- ai prompt engineer San Jose, CA
- ai developer San Jose, CA
- ai engineer remote San Jose, CA
- senior infrastructure engineer San Jose, CA
- principal infrastructure engineer San Jose, CA
- infrastructure engineer San Jose, CA

