AI Infrastructure Engineer
Advanced Micro Devices , Inc.
WHAT YOU DO AT AMD CHANGES EVERYTHING
At AMD, our mission is to build great products that accelerate next-generation computing experiences-from AI and data centers, to PCs, gaming and embedded systems. Grounded in a culture of innovation and collaboration, we believe real progress comes from bold ideas, human ingenuity and a shared passion to create something extraordinary. When you join AMD, you'll discover the real differentiator is our culture. We push the limits of innovation to solve the world's most important challenges-striving for execution excellence, while being direct, humble, collaborative, and inclusive of diverse perspectives. Join us as we shape the future of AI and beyond. Together, we advance your career. THE PERSON:
We are seeking a DevOps / Platform Engineer to join our team building and operating large-scale GPU compute infrastructure that powers AI and ML workloads. The ideal candidate should be passionate about software engineering and possess leadership skills to independently deliver on multi-quarter projects. They should be able to caommunicate effectively and work optimally with their peers within our larger organization. Finally, you aren't afraid of a team in more of a startup mode at a larger company and willing to jump in to help in areas adjacent to your main project as needed. Key Responsibilities
#LI-HYBRID Benefits offered are described: AMD benefits at a glance. AMD does not accept unsolicited resumes from headhunters, recruitment agencies, or fee-based recruitment services. AMD and its subsidiaries are equal opportunity, inclusive employers and will consider all applicants without regard to age, ancestry, color, marital status, medical condition, mental or physical disability, national origin, race, religion, political and/or third-party affiliation, sex, pregnancy, sexual orientation, gender identity, military or veteran status, or any other characteristic protected by law. We encourage applications from all qualified candidates and will accommodate applicants' needs under the respective laws throughout all stages of the recruitment and selection process. AMD may use Artificial Intelligence to help screen, assess or select applicants for this position. AMD's "Responsible AI Policy" is available here. This posting is for an existing vacancy.
At AMD, our mission is to build great products that accelerate next-generation computing experiences-from AI and data centers, to PCs, gaming and embedded systems. Grounded in a culture of innovation and collaboration, we believe real progress comes from bold ideas, human ingenuity and a shared passion to create something extraordinary. When you join AMD, you'll discover the real differentiator is our culture. We push the limits of innovation to solve the world's most important challenges-striving for execution excellence, while being direct, humble, collaborative, and inclusive of diverse perspectives. Join us as we shape the future of AI and beyond. Together, we advance your career. THE PERSON:
We are seeking a DevOps / Platform Engineer to join our team building and operating large-scale GPU compute infrastructure that powers AI and ML workloads. The ideal candidate should be passionate about software engineering and possess leadership skills to independently deliver on multi-quarter projects. They should be able to caommunicate effectively and work optimally with their peers within our larger organization. Finally, you aren't afraid of a team in more of a startup mode at a larger company and willing to jump in to help in areas adjacent to your main project as needed. Key Responsibilities
- Build and extend platform capabilities to enable new classes of workloads (e.g., interactive development pods, CI pipelines, inference services, benchmarking jobs).
- Design and operate scalable orchestration systems using Kubernetes across both on-prem and multi-cloud environments.
- Develop platform features such as secret management, configuration management, and deployment automation for customers.
- Partner with development teams to extend the GPU developer platform with features, APIs, templates, and self-service workflows that streamline job orchestration and environment management.
- Manage service lifecycle within Kubernetes using Helm and GitOps workflows (e.g., ArgoCD or Flux).
- Apply expertise in storage and networking to design and integrate CSI drivers, persistent volumes, and network policies that enable high-performance GPU workloads.
- 5+ years of experience in DevOps, Platform, or Infrastructure Engineering.
- Deep hands-on experience with Kubernetes and container orchestration at scale.
- Proven ability to design and deliver platform features that serve internal customers or developer teams
- Experience building developer-facing platforms or internal developer portals (e.g.custom workflow tooling).
- Hands-on experience in storage or network engineering within Kubernetes environments (e.g., CSI drivers, dynamic provisioning, CNI plugins, or network policy).
- Experience with Infrastructure as Code tools like Terraform.
- Background in HPC, Slurm , or GPU-based compute systems for ML/AI workloads.
- Practical experience with monitoring and observability tools (Prometheus, Grafana, Loki, etc.).
- Understanding of machine learning frameworks (PyTorch, vLLM, SGLang, etc.).
#LI-HYBRID Benefits offered are described: AMD benefits at a glance. AMD does not accept unsolicited resumes from headhunters, recruitment agencies, or fee-based recruitment services. AMD and its subsidiaries are equal opportunity, inclusive employers and will consider all applicants without regard to age, ancestry, color, marital status, medical condition, mental or physical disability, national origin, race, religion, political and/or third-party affiliation, sex, pregnancy, sexual orientation, gender identity, military or veteran status, or any other characteristic protected by law. We encourage applications from all qualified candidates and will accommodate applicants' needs under the respective laws throughout all stages of the recruitment and selection process. AMD may use Artificial Intelligence to help screen, assess or select applicants for this position. AMD's "Responsible AI Policy" is available here. This posting is for an existing vacancy.
Vacancy posted 5 days ago
Similar jobs that could be interesting for youBased on the AI Infrastructure Engineer in San Jose, CA vacancy
$163.5k - $212.4k
...security, and dependability. Partner with engineering teams to understand real-world... ...in software design and development for AI model training, and/or inference optimization... ...application/project ~ Experience with cloud infrastructure and training (Azure, AWS, etc.) ~ CI/...SuggestedFull timeTemporary workFlexible hours$94.16k - $141k
...are the essential building blocks of the data infrastructure that connects our world. Across enterprise, cloud and AI, and carrier architectures, our innovative technology... ...Master's degree in Computer Science, Computer Engineering, Electrical Engineering, or a related field...Suggested$124.09k - $210k
...Senior AI Data Infrastructure Engineer Santa Clara, CA XPENG is a leading smart technology company at the forefront of innovation, integrating advanced AI and autonomous driving technologies into its vehicles, including electric vehicles (EVs), electric vertical...SuggestedFull timeWork experience placement$160k - $240k
...AI Cloud Infrastructure Engineer - Fury Team Sunnyvale, CA The future of defense will be decided by those who field intelligent machines at scale. At Scout AI, we're developing Fury, the first robotic foundation model for defense, to give U.S. forces overwhelming...SuggestedFull timeRelocation package$136.5k - $253.5k
Cadence is seeking a highly skilled AI Systems Engineer to join their team in San Jose, CA. This hands-on, senior role will lead the AI infrastructure development, including architecting high-performance GPU clusters and deploying advanced AI models. Ideal candidates will...Suggested$174k - $252k
Google Inc. is looking for a Senior Software Engineer in Sunnyvale, CA, to join the AI and Infrastructure team. The role involves developing next-generation technologies, managing project priorities, and working on critical projects that impact billions of users. Candidates...$191k - $315k
...Senior Staff AI Engineer, Network Growth AI LinkedIn is the world's largest professional network, built to create economic opportunity... ...discipline. Prior experience with large scale ML data infrastructure Experience with developing and designing production scale recommender...For contractorsWork at officeFlexible hours$174k - $252k
A leading tech company is seeking a Senior Software Engineer for AI and Infrastructure. This position involves writing and testing software, participating in design reviews, and maintaining coding best practices. Candidates should have at least 5 years of programming experience...$163.5k - $212.4k
NIO is seeking a Senior AI Inference Infrastructure Software Engineer in San Jose, CA, specializing in building scalable inference systems for large language and vision-language models. This role requires over 5 years of software development experience and strong skills...$163.5k - $212.4k
...flagship sedan, and the ET5, a mid-size smart electric sedan.**About the Position**We are looking for a senior AI Inference Infrastructure Software Engineer with strong hands-on experience building, optimizing, and deploying high-performance, scalable inference systems...Full timeTemporary workImmediate startFlexible hours$184k - $287.5k
...Joining NVIDIA's DGX Cloud AI Efficiency Team means contributing to the infrastructure that powers our innovative AI research. This team focuses on developing... ...innovation. We are seeking an AI infrastructure software engineer to join our team. You'll be instrumental in...$314.8k - $359.3k
...Sr. Distinguished AI Engineer (Agentic AI Platform) Overview: At Capital One, we are creating responsible and reliable AI systems... ...customer experiences. Our investments in technology infrastructure and world-class talent - along with our deep experience in machine...Full timePart timeWork at officeLocal area- ...Cerebras Systems builds the world's largest AI chip, 56 times larger than GPUs. Our novel wafer-scale architecture provides... ...We are seeking a highly skilled and experienced AI Infrastructure Operations Engineer to manage and operate our cutting-edge machine learning compute...
- Google Inc. is seeking a Software Engineering Manager in San Jose, CA to lead a team focused on AI and machine learning infrastructure, optimizing technical leadership across major projects. This role requires at least 8 years of experience with distributed systems and...
$91.7k - $158.82k
...healthy, fulfilling life in and outside of work. Your Mission: We are seeking a highly motivated and talented AI Infrastructure & Platform Ops Engineer to join our team. In this role you will have the opportunity to work on cutting-edge AI technologies and...Full timeTemporary workWork experience placementWork at officeRemote workRelocationFlexible hoursShift work3 days per week$168k - $270.25k
...looking to hire a deeply technical, creative, and Senior AI Platform Engineer to build, support, and maintain the next generation of AI-... ...What you will be doing: Define and lead AI-native infrastructure roadmaps and cross-organizational initiatives. Architect...- A leading technology company in Santa Clara is seeking a Principal Backend Infrastructure Engineer to drive technical strategies for scalable search platforms. In this role, you will collaborate with backend engineers and machine learning researchers to enhance the functionality...
$229.9k - $262.4k
...Sr. Lead AI Engineer (GenAI Platform) Overview: At Capital One, we are creating responsible and reliable AI systems, changing... ...personalized customer experiences. Our investments in technology infrastructure and world-class talent - along with our deep experience in...Full timePart timeLocal area$215.2k - $245.6k
...Lead AI Engineer (Gen AI Platform Services) Overview At Capital One, we are creating responsible and reliable AI systems, changing... ...customer experiences. Our investments in technology infrastructure and world-class talent - along with our deep experience in machine...Full timePart timeLocal area$123.2k - $189.1k
...releases by turning failures into actionable engineering insights at scale. This is a... ...vehicle hardware and compute -not cloud infrastructure or hardware design. The mission of... ...intelligent triage, deep software debugging, and AI-assisted failure analysis across...Local areaWork from homeFlexible hours- ...Title: Prinicipal AI Engineer Location: Sunnyvale, California. Duration: 6 to 12+ Months Job Description:... ...Proficient PostgreSQL, embeddings/vector search Cloud/Infrastructure Proficient GCP (BigQuery, GCS), Kubernetes, CCM Integration...Contract work
- ...Kai is the AI company rebuilding cybersecurity for the machine-speed era. Founded... ...class leadership team: Our Heads of AI, Engineering, and Product bring extensive experience... ...Engineer to drive the security of the Azure infrastructure that powers the Kai AI-native...
- Job Title Bachelor’s degree with 8+ years of experience, or Master’s degree with 6+ years in CS, EE, IT, or related field. 7+ years of hands-on experience in firmware or embedded software development. Strong proficiency in C and/or C++ for embedded systems. ...
$229.9k - $262.4k
Senior Lead AI Engineer (Gen AI Platform Services, Agentic AI) Overview: At Capital One, we are creating responsible... ...customer experiences. Our investments in technology infrastructure and world-class talent - along with our deep experience in machine...Full timePart timeLocal area$128.4k - $172.3k
...position. Meet the Team Join Cisco's Enterprise AI team, the core group enabling Generative AI powered... ...We operate at the intersection of applied AI, cloud infrastructure and security -partnering across engineering, security, compliance, and product teams to bring...Full timeTemporary workLocal areaFlexible hours$141k - $202k
A leading technology firm in Sunnyvale, CA is seeking a Software Engineer to improve compiler integration for TPUs and other accelerators. Ideal candidates hold a Bachelor's degree and possess strong skills in C++ and distributed systems. You'll write and test code, develop...$165k - $188k
A leading IT solutions company in San Jose seeks a Sr. Software Engineer to develop AI/ML infrastructure software. This role requires strong proficiency in Python and expertise in the Nvidia AI/LLM stack. Responsibilities include deploying LLM applications, optimizing...Work at office$181.1k - $318.4k
...for its Special Projects team in Cupertino, California. The role focuses on building innovative applications and robust infrastructure to support AI research. Candidates should excel in programming languages like Go or Swift and have experience with web services and containers...- A leading technology firm in California is seeking network engineers with hands-on experience in InfiniBand and Ethernet for managing high-performance computing (HPC) and artificial intelligence (AI) environments. Candidates should have advanced knowledge of networking...
- ...Senior AI/ML DevOps Engineer Join Cisco's CX AI Incubation Team as a Senior AI/ML DevOps Engineer and help productionize LLM/SLM capabilities... ..., powering delivery intelligence, network automation, infrastructure testing, and intelligence on edge. You will collaborate...
Do you want to receive more vacancies?
Subscribe and receive similar vacancies to AI Infrastructure Engineer. Be the first to apply!
Related searches
- machine learning ai engineer San Jose, CA
- senior ai engineer San Jose, CA
- ai engineer remote San Jose, CA
- ai ml engineer San Jose, CA
- ai engineer San Jose, CA
- ai developer San Jose, CA
- ai research engineer San Jose, CA
- ai prompt engineer San Jose, CA
- data infrastructure engineer San Jose, CA
- infrastructure engineering manager San Jose, CA

