Director - Hyperscale, HPC & Sovereign AI Deployment and Fleet Operations
Advanced Micro Devices
WHAT YOU DO AT AMD CHANGES EVERYTHING
At AMD, our mission is to build great products that accelerate next‑generation computing experiences—from AI and data centers, to PCs, gaming and embedded systems. Grounded in a culture of innovation and collaboration, we believe real progress comes from bold ideas, human ingenuity and a shared passion to create something extraordinary. When you join AMD, you’ll discover the real differentiator is our culture. We push the limits of innovation to solve the world’s most important challenges—striving for execution excellence, while being direct, humble, collaborative, and inclusive of diverse perspectives. Join us as we shape the future of AI and beyond. Together, we advance your career.THE ROLE:
As the Director of Cloud, HPC & Sovereign AI Customer Engineering within the Compute & Enterprise AI Solutions Customer Engineering organization, you will lead the team responsible for enabling successful deployment, adoption, and lifecycle support of AMD compute and AI solutions across Cloud, HPC, and Sovereign AI customers. This highly visible leadership role oversees a team of Customer Program Managers (CPMs) responsible for guiding customers through new product introductions, deployment readiness, production ramp, and long‑term fleet sustainment. Working closely with customers, Customer Platform Engineering, Product Management, Engineering and Architecture you will drive successful customer outcomes across the full lifecycle of AMD compute and AI platforms.THE PERSON:
The ideal candidate is a strong technical and organizational leader with experience supporting large‑scale cloud, AI, HPC, or datacenter deployments. You have a proven track record of leading customer‑facing technical teams, managing complex customer engagements, and driving successful deployment and sustainment of infrastructure at scale. You are equally comfortable engaging with customer executives, architects, operations teams, and engineering organizations while driving alignment across AMD. You possess strong technical credibility, customer advocacy skills, and the ability to lead through influence in complex environments.KEY RESPONSIBILITIES:
Lead and scale the Cloud, HPC & Sovereign AI Customer Engineering organization supporting strategic customer deployments and lifecycle management. Lead a team of Customer Program Managers responsible for customer engagement, deployment readiness, new product introduction (NPI) execution, production ramp, and fleet sustainment activities. Drive successful deployment, adoption, and operational readiness of AMD compute and AI solutions across Cloud, HPC, and Sovereign AI customers. Serve as the executive escalation point for strategic customer issues and drive resolution of complex deployment, platform, performance, and operational challenges. Partner closely with Customer Platform Engineering teams, including PAE, BAE, Security Engineering, and Debug Engineering, to ensure successful customer outcomes. Develop deployment methodologies, operational best practices, and customer engagement frameworks that accelerate customer time‑to‑production. Drive fleet sustainment strategies including observability, telemetry, remote diagnostics, lifecycle management, and operational readiness. Partner with customers, and AMD engineering teams to support successful platform deployment and long‑term fleet success. Act as the voice of the customer, ensuring customer deployment experiences and operational insights influence future products, platforms, and solutions. Mentor and develop CPM leaders while building a high‑performance, customer‑focused culture.PREFERRED EXPERIENCE:
Experience leading customer‑facing engineering, technical program management, cloud infrastructure, AI infrastructure, HPC, datacenter operations, or related technical organizations. Experience leading technical teams responsible for customer deployments, operational readiness, and lifecycle support of complex infrastructure platforms. Experience supporting large‑scale cloud, AI, HPC, or enterprise infrastructure deployments from new product introduction through fleet sustainment. Strong understanding of datacenter infrastructure, compute platforms, AI systems, observability, telemetry, and operational readiness. Experience managing customer escalations and driving resolution of complex technical and operational issues. Experience working with hyperscalers, cloud providers, HPC customers, sovereign AI customers. Proven ability to drive alignment across engineering, product, operations, and customer‑facing organizations. Strong communication and executive engagement skills.ACADEMIC CREDENTIALS:
Bachelor's or Master's degree in Engineering, Computer Science, or a related technical field. This role is not eligible for visa sponsorship. Benefits offered are described: AMD benefits at a glance. AMD does not accept unsolicited resumes from headhunters, recruitment agencies, or fee‑based recruitment services. AMD and its subsidiaries are equal opportunity, inclusive employers and will consider all applicants without regard to age, ancestry, color, marital status, medical condition, mental or physical disability, national origin, race, religion, political and/or third‑party affiliation, sex, pregnancy, sexual orientation, gender identity, military or veteran status, or any other characteristic protected by law. We encourage applications from all qualified candidates and will accommodate applicants’ needs under the respective laws throughout all stages of the recruitment and selection process. AMD may use Artificial Intelligence to help screen, assess or select applicants for this position. AMD’s “Responsible AI Policy” is available here. This posting is for an existing vacancy. #J-18808-Ljbffr Advanced Micro DevicesVacancy posted 3 days ago
Similar jobs that could be interesting for youBased on the Director - Hyperscale, HPC & Sovereign AI Deployment and Fleet Operations in Santa Clara, CA vacancy
- ...Engineering/SRE, and Data Center Operations) and their leadership,... ...validation, remote fleet bootstrapping,... ...triage and resolve day‑one deployment issues. NVIDIA Software... ...performance computing (HPC) infrastructure bring‑up... ...between pioneering AI software stacks and foundational...OperationsFleetRemote work
$159k - $231k
Program Manager III, NPI Technical Operations, Cloud Supply Chain Google -... ...on progress and deadlines. Fleet Transition Management (FTM)... ...You will focus on optimizing deployment and orderability... ...to team workflows, advocating AI adoption, and maintaining operational...OperationsFleetFull time$208k - $327.75k
...Product Manager to architect for the operational future of Enterprise AI. While the NVIDIA DGX is the... ...world’s most sophisticated companies deploy, manage, and scale their Enterprise... ...automated health checks that keep the fleet at peak performance without manual intervention...OperationsFleetNight shift$165k - $215k
...overhead sensors, software, and AI-powered analytics to locate... ..., down to the fixture. We are deployed across 1,400+ stores with retailers... ..., noisy environments - at fleet scale. RADAR is one of the best... ..., delivery, and ongoing operations of fleetwide RFID solutions to...OperationsFleetWorldwideFlexible hours$171k - $232k
...WeRide.ai is looking for a lead Technical Product Manager (PMT... ...and its global commercial deployment. This role reports directly to... ...customer desires, business goals, operational constraints, regulatory... ...strategies for new ODD expansion and fleet scale-up, balance user...OperationsFleetOdd jobTemporary work$180k
...s mission is to create AI systems that can accurately... ...on curiosity. We operate with a flat organizational... ...contribute to deployment and operations frameworks... ...years in the ethernet AI/HPC space. Deep understanding... ...operations to optimize the fleet for training and...OperationsFleetTemporary work$192k - $279k
...development with engineers. The AI and Infrastructure team is... ...the future of world‑leading hyperscale computing, with key teams working... ...Networking, Data Center operations, systems research, and much more... ...NPI within the production fleet, ensuring all reliability and...OperationsFleetImmediate startWorldwide$144.33k - $240.55k
...with sales, engineering, operations, and management to drive product... ...deep engagement with hyperscale customers (cloud, AI, and large-scale data center... ..., qualification, and deployment at scale Define and execute... ..., deployment models, and fleet-level optimization (performance...OperationsFleetLocal areaWorldwideFlexible hours$216.15k - $262k
...the only vertically integrated AI infrastructure company built from the ground up, we own and operate each layer of the stack — from... ...the Role NVIDIA Vera Rubin deployments begin in early 2027. We are... ...firmware versioning matters for fleet reliability at scale. Networking...OperationsFleetTemporary work$158k - $241.9k
...collaborates closely with the Operations Team. Our primary function is... ...data collection vehicle fleet. This role is well‑suited for... ...also incorporate generative AI into both our development tools... ...and reliability for real‑world deployment Resolve software issues promptly...OperationsFleetLocal areaRelocationRelocation package$192k - $278k
...solutions (including design, deployment, sustaining) and launching data... ...networking, accelerator (ML/AI) systems, and networking... ...the future of world‑leading hyperscale computing, with key teams working... ...Global Networking, Data Center operations, systems research, and much...OperationsFull timeWork at officeWorldwide$182k - $273k
THE ROLE As a Technical Program Manager for Hyperscale Operations, you will be the strategic glue between hardware engineering and global manufacturing... ...and improve the predictability of large-scale hardware deployments. WHAT YOU BRING Technical Program Leadership: Proven...OperationsContract workWork at officeFlexible hours$168k - $258.75k
...JR2017309NVIDIA's DGX Cloud (DGXC) powers AI for strategic research and product... ...improvements in resilience, service stability, and operational scale. The TPM also guides architectural... ..., reliability, operational scale, and fleet-wide goodput across DGX Cloud.* Partner across...OperationsFleet- ...linkedin.com. The org builds and operates massive-scale systems:... ...observability, and Data and AI/ML platforms. Engineering... ...migrations, and cross-region deployments, ensuring availability, durability... ...Experience working in cloud or hyperscale environments, including...OperationsFor contractorsWork at officeFlexible hours
$200k - $322k
...Cloud is redefining how organizations deploy and scale AI infrastructure. We’re looking for a Senior... ...initiatives across development, operations, and cloud deployment. This is a high‑impact... ...disaster recovery strategies. AI/ML & HPC Workloads: Understanding storage requirements...Operations- ...Wayve is the leading developer of Embodied AI technology. Our advanced AI software and... .... The role We’re looking for a Technical Operations Manager to join Wayve’s Technical... ...actionable on‑road plans in partnership with Fleet Operations leadership, ensuring alignment...OperationsFleetFull timeWork at officeRemote workWork from home
$300 per month
...vertically integrated AI infrastructure company... ...ground up, we own and operate each layer of the stack... ...infrastructure for Crusoe's fleet GPU's and data center.... ...The DCIE team owns, deployment maintenance, observability... ...fleet operations or hyperscale data center environments...OperationsFleetTemporary work$108.36k - $154.8k
...working in close partnership with the Forward Deployed Engineer. This individual owns the data... ...PM is both a technical thinker and an operational coordinator — responsible for ensuring that... ...technical specifications for Astreya AI operations hub, data pipelines, connectors...OperationsFull timeTemporary workFlexible hours$177k - $237k
...CoreWeave is The Essential Cloud for AI™. Built for pioneers by pioneers, CoreWeave... ...acceleration in large AI clusters. We build and operate high-throughput, low‑latency... ...teams to improve development processes, deployment pipelines, and system reliability Experience...OperationsPermanent employmentTemporary workCasual workWork at officeFlexible hours$152k - $241.5k
...role focuses on keeping critical systems operational while leveraging AI technologies to deliver groundbreaking... ...ensuring they integrate cleanly with HPC schedulers, storage, and network... ...automated host lifecycle management, fleet reliability/auto‑healing, and end‑to‑end...OperationsFleet$152k - $241.5k
.... You’ll harness the power of AI to deliver groundbreaking solutions... ...design and implementation to operation and continuous improvement,... ...they integrate cleanly with HPC schedulers, storage, and network... ...host lifecycle management, fleet reliability/auto‑healing, E2E...OperationsFleet$207k - $300k
...NVIDIA, AMD, or other AI accelerators), memory... ...modern LLMs and their deployment on AI accelerators. Experience... ...of world‑leading hyperscale computing, with key... ...Networking, Data Center operations, systems research, and... ...solutions at Google fleet‑wide scale. Run performance...OperationsFleetFull timeTemporary workWorldwide- ...Zingly, we’re creating the next generation of AI-powered systems that transform how... ...customers. Our platform enables companies to deploy intelligent workflows and AI agents that... ...readiness, customer commitments, and operational health. Establish metrics, dashboards,...Operations
- ...capabilities at scale. Data Center Operations & Engineering: Driving forecasting,... ...and sustainable infrastructure. AI Readiness: Leading GPU deployments and infrastructure modernization to... ...Infrastructure: Experience with hyperscale network design including spine-leaf...OperationsFor contractorsWork at officeRemote workWork from homeWorldwideFlexible hours
$174k - $252k
...Experience with either AI or Linux kernel... ...Integration (CI)/Continuous Deployment (CD) best practices.... ...is responsible for the Operating System that runs Google... ...future of world‑leading hyperscale computing, with key... ...rollout of fixes to the fleet in alignment with...OperationsFleetFull timeWorldwide$237k - $329k
...research and development (R&D) into deployed, real world systems.... ...programming languages (e.g. Robot Operating System) or similar frameworks... ...systems across our global hyperscale data center footprint. You will... ...compute environments. The AI and Infrastructure team is redefining...OperationsFull timeContract workRemote workWorldwideFlexible hours$120k - $170k
...high-resolution, all-weather perception that enables physical AI systems to operate reliably in complex real-world environments where... ...sensors often struggle. We partner directly with customers deploying autonomous and intelligent systems across transportation, logistics...OperationsWork at officeRelocation package- Google Inc. in Sunnyvale, CA is seeking a Program Manager III for their NPI Technical Operations team. The role involves leading complex projects, managing workflows, and optimizing deployment strategies. Candidates should hold a degree and possess 5 years of experience in...OperationsFleet
- ...that literally connect our world – like AI and IoT. If you want to push the boundaries... ...competitiveness. To that end, the Operations Product Development team drives technology... ...as needed Continuous integration and deployment of code Write secure, reliable, testable...OperationsFull timeRemote workRelocationFlexible hours
- ..., United States The era of pervasive AI has arrived. In this era, organizations... ...fundamentally transform their businesses and operations at scale. SambaNova Suite™ is the... ...Provide hands-on support from development to deployment, including requirements gathering,...OperationsFull timeTemporary workLocal areaFlexible hours
Do you want to receive more vacancies?
Subscribe and receive similar vacancies to Director - Hyperscale, HPC & Sovereign AI Deployment and Fleet Operations. Be the first to apply!
Related searches
- director mba Santa Clara, CA
- director of inventory management Santa Clara, CA
- director of public policy Santa Clara, CA
- director of implementation Santa Clara, CA
- director of materials management Santa Clara, CA
- director of employee engagement Santa Clara, CA
- director of automation Santa Clara, CA
- director of process improvement Santa Clara, CA
- director r&d Santa Clara, CA
- senior director epidemiology Santa Clara, CA


