Sign up to access all features of our service.
  • Job search
  • Favorites
  • Create a CV
    New
  • Salaries
  • Subscriptions

Director - Hyperscale, HPC & Sovereign AI Deployment and Fleet Operations

Advanced Micro Devices

WHAT YOU DO AT AMD CHANGES EVERYTHING

At AMD, our mission is to build great products that accelerate next‑generation computing experiences—from AI and data centers, to PCs, gaming and embedded systems. Grounded in a culture of innovation and collaboration, we believe real progress comes from bold ideas, human ingenuity and a shared passion to create something extraordinary. When you join AMD, you’ll discover the real differentiator is our culture. We push the limits of innovation to solve the world’s most important challenges—striving for execution excellence, while being direct, humble, collaborative, and inclusive of diverse perspectives. Join us as we shape the future of AI and beyond. Together, we advance your career.

THE ROLE:

As the Director of Cloud, HPC & Sovereign AI Customer Engineering within the Compute & Enterprise AI Solutions Customer Engineering organization, you will lead the team responsible for enabling successful deployment, adoption, and lifecycle support of AMD compute and AI solutions across Cloud, HPC, and Sovereign AI customers. This highly visible leadership role oversees a team of Customer Program Managers (CPMs) responsible for guiding customers through new product introductions, deployment readiness, production ramp, and long‑term fleet sustainment. Working closely with customers, Customer Platform Engineering, Product Management, Engineering and Architecture you will drive successful customer outcomes across the full lifecycle of AMD compute and AI platforms.

THE PERSON:

The ideal candidate is a strong technical and organizational leader with experience supporting large‑scale cloud, AI, HPC, or datacenter deployments. You have a proven track record of leading customer‑facing technical teams, managing complex customer engagements, and driving successful deployment and sustainment of infrastructure at scale. You are equally comfortable engaging with customer executives, architects, operations teams, and engineering organizations while driving alignment across AMD. You possess strong technical credibility, customer advocacy skills, and the ability to lead through influence in complex environments.

KEY RESPONSIBILITIES:

Lead and scale the Cloud, HPC & Sovereign AI Customer Engineering organization supporting strategic customer deployments and lifecycle management. Lead a team of Customer Program Managers responsible for customer engagement, deployment readiness, new product introduction (NPI) execution, production ramp, and fleet sustainment activities. Drive successful deployment, adoption, and operational readiness of AMD compute and AI solutions across Cloud, HPC, and Sovereign AI customers. Serve as the executive escalation point for strategic customer issues and drive resolution of complex deployment, platform, performance, and operational challenges. Partner closely with Customer Platform Engineering teams, including PAE, BAE, Security Engineering, and Debug Engineering, to ensure successful customer outcomes. Develop deployment methodologies, operational best practices, and customer engagement frameworks that accelerate customer time‑to‑production. Drive fleet sustainment strategies including observability, telemetry, remote diagnostics, lifecycle management, and operational readiness. Partner with customers, and AMD engineering teams to support successful platform deployment and long‑term fleet success. Act as the voice of the customer, ensuring customer deployment experiences and operational insights influence future products, platforms, and solutions. Mentor and develop CPM leaders while building a high‑performance, customer‑focused culture.

PREFERRED EXPERIENCE:

Experience leading customer‑facing engineering, technical program management, cloud infrastructure, AI infrastructure, HPC, datacenter operations, or related technical organizations. Experience leading technical teams responsible for customer deployments, operational readiness, and lifecycle support of complex infrastructure platforms. Experience supporting large‑scale cloud, AI, HPC, or enterprise infrastructure deployments from new product introduction through fleet sustainment. Strong understanding of datacenter infrastructure, compute platforms, AI systems, observability, telemetry, and operational readiness. Experience managing customer escalations and driving resolution of complex technical and operational issues. Experience working with hyperscalers, cloud providers, HPC customers, sovereign AI customers. Proven ability to drive alignment across engineering, product, operations, and customer‑facing organizations. Strong communication and executive engagement skills.

ACADEMIC CREDENTIALS:

Bachelor's or Master's degree in Engineering, Computer Science, or a related technical field. This role is not eligible for visa sponsorship. Benefits offered are described: AMD benefits at a glance. AMD does not accept unsolicited resumes from headhunters, recruitment agencies, or fee‑based recruitment services. AMD and its subsidiaries are equal opportunity, inclusive employers and will consider all applicants without regard to age, ancestry, color, marital status, medical condition, mental or physical disability, national origin, race, religion, political and/or third‑party affiliation, sex, pregnancy, sexual orientation, gender identity, military or veteran status, or any other characteristic protected by law. We encourage applications from all qualified candidates and will accommodate applicants’ needs under the respective laws throughout all stages of the recruitment and selection process. AMD may use Artificial Intelligence to help screen, assess or select applicants for this position. AMD’s “Responsible AI Policy” is available here. This posting is for an existing vacancy. #J-18808-Ljbffr Advanced Micro Devices

Vacancy posted 3 days ago
Similar jobs that could be interesting for youBased on the Director - Hyperscale, HPC & Sovereign AI Deployment and Fleet Operations in Santa Clara, CA vacancy
  •  ...Engineering/SRE, and Data Center Operations) and their leadership,...  ...validation, remote fleet bootstrapping,...  ...triage and resolve day‑one deployment issues. NVIDIA Software...  ...performance computing (HPC) infrastructure bring‑up...  ...between pioneering AI software stacks and foundational... 
    Operations
    Fleet
    Remote work

    NVIDIA Corporation

    Santa Clara, CA
    1 day ago
  • $159k - $231k

    Program Manager III, NPI Technical Operations, Cloud Supply Chain Google -...  ...on progress and deadlines. Fleet Transition Management (FTM)...  ...You will focus on optimizing deployment and orderability...  ...to team workflows, advocating AI adoption, and maintaining operational... 
    Operations
    Fleet
    Full time

    Google Inc.

    Sunnyvale, CA
    2 days ago
  • $208k - $327.75k

     ...Product Manager to architect for the operational future of Enterprise AI. While the NVIDIA DGX is the...  ...world’s most sophisticated companies deploy, manage, and scale their Enterprise...  ...automated health checks that keep the fleet at peak performance without manual intervention... 
    Operations
    Fleet
    Night shift

    NVIDIA Corporation

    Santa Clara, CA
    5 days ago
  • $165k - $215k

     ...overhead sensors, software, and AI-powered analytics to locate...  ..., down to the fixture. We are deployed across 1,400+ stores with retailers...  ..., noisy environments - at fleet scale. RADAR is one of the best...  ..., delivery, and ongoing operations of fleetwide RFID solutions to... 
    Operations
    Fleet
    Worldwide
    Flexible hours

    Radar

    Sunnyvale, CA
    5 days ago
  • $171k - $232k

     ...WeRide.ai is looking for a lead Technical Product Manager (PMT...  ...and its global commercial deployment. This role reports directly to...  ...customer desires, business goals, operational constraints, regulatory...  ...strategies for new ODD expansion and fleet scale-up, balance user... 
    Operations
    Fleet
    Odd job
    Temporary work

    WeRide.ai

    San Jose, CA
    1 day ago
  • $180k

     ...s mission is to create AI systems that can accurately...  ...on curiosity. We operate with a flat organizational...  ...contribute to deployment and operations frameworks...  ...years in the ethernet AI/HPC space. Deep understanding...  ...operations to optimize the fleet for training and... 
    Operations
    Fleet
    Temporary work

    xAI

    Palo Alto, CA
    2 days ago
  • $192k - $279k

     ...development with engineers. The AI and Infrastructure team is...  ...the future of world‑leading hyperscale computing, with key teams working...  ...Networking, Data Center operations, systems research, and much more...  ...NPI within the production fleet, ensuring all reliability and... 
    Operations
    Fleet
    Immediate start
    Worldwide

    Google Inc.

    Sunnyvale, CA
    2 days ago
  • $144.33k - $240.55k

     ...with sales, engineering, operations, and management to drive product...  ...deep engagement with hyperscale customers (cloud, AI, and large-scale data center...  ..., qualification, and deployment at scale Define and execute...  ..., deployment models, and fleet-level optimization (performance... 
    Operations
    Fleet
    Local area
    Worldwide
    Flexible hours

    Kioxia

    San Jose, CA
    7 days ago
  • $216.15k - $262k

     ...the only vertically integrated AI infrastructure company built from the ground up, we own and operate each layer of the stack — from...  ...the Role NVIDIA Vera Rubin deployments begin in early 2027. We are...  ...firmware versioning matters for fleet reliability at scale. Networking... 
    Operations
    Fleet
    Temporary work

    Crusoe

    Sunnyvale, CA
    25 days ago
  • $158k - $241.9k

     ...collaborates closely with the Operations Team. Our primary function is...  ...data collection vehicle fleet. This role is well‑suited for...  ...also incorporate generative AI into both our development tools...  ...and reliability for real‑world deployment Resolve software issues promptly... 
    Operations
    Fleet
    Local area
    Relocation
    Relocation package

    Israelvcforum

    Mountain View, CA
    1 day ago
  • $192k - $278k

     ...solutions (including design, deployment, sustaining) and launching data...  ...networking, accelerator (ML/AI) systems, and networking...  ...the future of world‑leading hyperscale computing, with key teams working...  ...Global Networking, Data Center operations, systems research, and much... 
    Operations
    Full time
    Work at office
    Worldwide

    Google Inc.

    Sunnyvale, CA
    3 days ago
  • $182k - $273k

    THE ROLE As a Technical Program Manager for Hyperscale Operations, you will be the strategic glue between hardware engineering and global manufacturing...  ...and improve the predictability of large-scale hardware deployments. WHAT YOU BRING Technical Program Leadership: Proven... 
    Operations
    Contract work
    Work at office
    Flexible hours

    Pure Storage

    Santa Clara, CA
    1 day ago
  • $168k - $258.75k

     ...JR2017309NVIDIA's DGX Cloud (DGXC) powers AI for strategic research and product...  ...improvements in resilience, service stability, and operational scale. The TPM also guides architectural...  ..., reliability, operational scale, and fleet-wide goodput across DGX Cloud.* Partner across... 
    Operations
    Fleet

    NVIDIA

    Santa Clara, CA
    5 days ago
  •  ...linkedin.com. The org builds and operates massive-scale systems:...  ...observability, and Data and AI/ML platforms. Engineering...  ...migrations, and cross-region deployments, ensuring availability, durability...  ...Experience working in cloud or hyperscale environments, including... 
    Operations
    For contractors
    Work at office
    Flexible hours

    LinkedIn

    Sunnyvale, CA
    5 hours ago
  • $200k - $322k

     ...Cloud is redefining how organizations deploy and scale AI infrastructure. We’re looking for a Senior...  ...initiatives across development, operations, and cloud deployment. This is a high‑impact...  ...disaster recovery strategies. AI/ML & HPC Workloads: Understanding storage requirements... 
    Operations

    NVIDIA Gruppe

    Santa Clara, CA
    4 days ago
  •  ...Wayve is the leading developer of Embodied AI technology. Our advanced AI software and...  .... The role We’re looking for a Technical Operations Manager to join Wayve’s Technical...  ...actionable on‑road plans in partnership with Fleet Operations leadership, ensuring alignment... 
    Operations
    Fleet
    Full time
    Work at office
    Remote work
    Work from home

    Icehouseventures

    Sunnyvale, CA
    1 day ago
  • $300 per month

     ...vertically integrated AI infrastructure company...  ...ground up, we own and operate each layer of the stack...  ...infrastructure for Crusoe's fleet GPU's and data center....  ...The DCIE team owns, deployment maintenance, observability...  ...fleet operations or hyperscale data center environments... 
    Operations
    Fleet
    Temporary work

    Crusoe

    Sunnyvale, CA
    3 days ago
  • $108.36k - $154.8k

     ...working in close partnership with the Forward Deployed Engineer. This individual owns the data...  ...PM is both a technical thinker and an operational coordinator — responsible for ensuring that...  ...technical specifications for Astreya AI operations hub, data pipelines, connectors... 
    Operations
    Full time
    Temporary work
    Flexible hours

    Astreya

    Santa Clara, CA
    2 days ago
  • $177k - $237k

     ...CoreWeave is The Essential Cloud for AI™. Built for pioneers by pioneers, CoreWeave...  ...acceleration in large AI clusters. We build and operate high-throughput, low‑latency...  ...teams to improve development processes, deployment pipelines, and system reliability Experience... 
    Operations
    Permanent employment
    Temporary work
    Casual work
    Work at office
    Flexible hours

    CoreWeave

    Sunnyvale, CA
    1 day ago
  • $152k - $241.5k

     ...role focuses on keeping critical systems operational while leveraging AI technologies to deliver groundbreaking...  ...ensuring they integrate cleanly with HPC schedulers, storage, and network...  ...automated host lifecycle management, fleet reliability/auto‑healing, and end‑to‑end... 
    Operations
    Fleet

    NVIDIA Corporation

    Santa Clara, CA
    4 days ago
  • $152k - $241.5k

     .... You’ll harness the power of AI to deliver groundbreaking solutions...  ...design and implementation to operation and continuous improvement,...  ...they integrate cleanly with HPC schedulers, storage, and network...  ...host lifecycle management, fleet reliability/auto‑healing, E2E... 
    Operations
    Fleet

    NVIDIA Gruppe

    Santa Clara, CA
    4 days ago
  • $207k - $300k

     ...NVIDIA, AMD, or other AI accelerators), memory...  ...modern LLMs and their deployment on AI accelerators. Experience...  ...of world‑leading hyperscale computing, with key...  ...Networking, Data Center operations, systems research, and...  ...solutions at Google fleet‑wide scale. Run performance... 
    Operations
    Fleet
    Full time
    Temporary work
    Worldwide

    Google

    Sunnyvale, CA
    3 days ago
  •  ...Zingly, we’re creating the next generation of AI-powered systems that transform how...  ...customers. Our platform enables companies to deploy intelligent workflows and AI agents that...  ...readiness, customer commitments, and operational health. Establish metrics, dashboards,... 
    Operations

    Zingly.ai

    Santa Clara, CA
    3 days ago
  •  ...capabilities at scale. Data Center Operations & Engineering: Driving forecasting,...  ...and sustainable infrastructure. AI Readiness: Leading GPU deployments and infrastructure modernization to...  ...Infrastructure: Experience with hyperscale network design including spine-leaf... 
    Operations
    For contractors
    Work at office
    Remote work
    Work from home
    Worldwide
    Flexible hours

    eTeam

    Mountain View, CA
    3 days ago
  • $174k - $252k

     ...Experience with either AI or Linux kernel...  ...Integration (CI)/Continuous Deployment (CD) best practices....  ...is responsible for the Operating System that runs Google...  ...future of world‑leading hyperscale computing, with key...  ...rollout of fixes to the fleet in alignment with... 
    Operations
    Fleet
    Full time
    Worldwide

    Google Inc.

    Sunnyvale, CA
    3 days ago
  • $237k - $329k

     ...research and development (R&D) into deployed, real world systems....  ...programming languages (e.g. Robot Operating System) or similar frameworks...  ...systems across our global hyperscale data center footprint. You will...  ...compute environments. The AI and Infrastructure team is redefining... 
    Operations
    Full time
    Contract work
    Remote work
    Worldwide
    Flexible hours

    Google Inc.

    Sunnyvale, CA
    4 days ago
  • $120k - $170k

     ...high-resolution, all-weather perception that enables physical AI systems to operate reliably in complex real-world environments where...  ...sensors often struggle. We partner directly with customers deploying autonomous and intelligent systems across transportation, logistics... 
    Operations
    Work at office
    Relocation package

    Zadar Labs

    Santa Clara, CA
    4 hours ago
  • Google Inc. in Sunnyvale, CA is seeking a Program Manager III for their NPI Technical Operations team. The role involves leading complex projects, managing workflows, and optimizing deployment strategies. Candidates should hold a degree and possess 5 years of experience in... 
    Operations
    Fleet

    Google Inc.

    Sunnyvale, CA
    4 days ago
  •  ...that literally connect our world – like AI and IoT. If you want to push the boundaries...  ...competitiveness. To that end, the Operations Product Development team drives technology...  ...as needed Continuous integration and deployment of code Write secure, reliable, testable... 
    Operations
    Full time
    Remote work
    Relocation
    Flexible hours

    Applied Materials

    Santa Clara, CA
    1 day ago
  •  ..., United States The era of pervasive AI has arrived. In this era, organizations...  ...fundamentally transform their businesses and operations at scale. SambaNova Suite™ is the...  ...Provide hands-on support from development to deployment, including requirements gathering,... 
    Operations
    Full time
    Temporary work
    Local area
    Flexible hours

    SambaNova Systems

    Santa Clara, CA
    5 days ago

Do you want to receive more vacancies?

Subscribe and receive similar vacancies to Director - Hyperscale, HPC & Sovereign AI Deployment and Fleet Operations. Be the first to apply!