Sign up to access all features of our service.
  • Job search
  • Favorites
  • Create a CV
    New
  • Salaries
  • Subscriptions

Principal TPM -AI Infrastructure

$90.1k - $199.5k

Oracle

Job Description

The AI Infrastructure GPU Operations Team drives deployment planning, execution governance, operational readiness, reliability, and business rhythm for OCI's rapidly expanding GPU infrastructure portfolio. As Principal Technical Program Manager, you will lead cross-functional programs that connect engineering, platform, operations, business, finance, observability, SRE, network, and leadership teams across complex GPU operations initiatives.

You will own operating mechanisms for regional deployment readiness, GPU fleet health, milestone tracking, executive reporting, incident and change governance, risk management, and operational handoff across multiple concurrent GPU operations programs. This role requires strong program discipline, business analytics capability, and the ability to turn ambiguous technical and operational inputs into clear priorities, metrics, decisions, and action plans.

You will also improve the way the organization scales by strengthening dashboards, telemetry, documentation, onboarding, playbooks, repeatable processes, and the practical use of AI to improve operations productivity. The ideal candidate brings crisp communication, strong ownership, and pragmatic simplification to high-visibility GPU operations programs where disciplined execution, customer impact, and measurable reliability outcomes matter.

You are a structured, data-driven program leader who values simplicity, scalability, reliability, and clear operational mechanisms. You thrive in collaborative environments, communicate crisply with senior stakeholders, and drive consistent execution through ownership, metrics, and disciplined follow-through. You combine strategic clarity with enough technical and operational depth to help teams deliver reliable OCI AI Infrastructure GPU Operations while continuously improving the processes, telemetry, and automation that support it.

Travel: as needed for cross-site coordination, stakeholder alignment, and partner engagements.

Responsibilities

Key Responsibilities GPU Fleet Operations & Reliability

  • Drive availability and reliability of large-scale GPU fleets, identifying systemic issues and leading cross-functional recovery efforts.

  • Support operational readiness and performance of distributed AI training and inference workloads across multi-region GPU clusters.

  • Lead GPU fleet health reviews across current and next-generation hardware, including NVIDIA H200, B200, GB200/GB300 platforms and AMD Instinct MI300X, MI325X, MI350X, MI355X, and related platforms.

Program Leadership & Execution

  • Own end-to-end execution of critical AI Infrastructure GPU Operations programs, ensuring alignment with business priorities, customer needs, and operational risk signals.

  • Set and run weekly operating cadences and governance forums across multiple concurrent initiatives, ensuring clear ownership, timelines, dependencies, decision points, and committed actions.

  • Coordinate cross-functional delivery across engineering, platform, operations, business operations, finance, observability, SRE, network, and senior leadership stakeholders.

Incident, Change & Deployment Governance

  • Manage deployment governance, change review, readiness tracking, stakeholder handoff, and operational execution processes.

  • Establish and scale structured incident management mechanisms, improving root cause analysis, corrective and preventive actions, and follow-through on durable fixes.

  • Serve as a primary escalation point between engineering and operations teams, resolving priority conflicts and accelerating issue resolution.

  • Lead Change Review Board processes for high-volume change activity, minimizing change-related incidents and protecting service quality.

Business Planning, Metrics & Executive Reporting

  • Build, model, and maintain business planning inputs, financial forecasts, analytical views, and operating reports for AI Infrastructure GPU Operations programs.

  • Own executive-level reporting, including monthly business reviews, weekly operational KPIs, critical project updates, risks, dependencies, decisions, and mitigation plans.

  • Provide data-driven insights into infrastructure performance, operational risk, customer impact, and measurable program outcomes for senior leadership.

Cross-Functional & Stakeholder Engagement

  • Strengthen partnerships with hardware vendors, cloud platform teams, SRE, cloud engineering, network teams, and other internal stakeholders to improve issue resolution and operational efficiency.

  • Translate complex technical, operational, and business situations into accurate narratives, recommendations, and action plans for senior stakeholders.

  • Drive structured escalation and bug reporting mechanisms that reduce time-to-resolution for critical issues.

Operational Excellence, Optimization & AI Productivity

  • Create and maintain documentation, playbooks, onboarding materials, runbooks, and repeatable processes that reduce ambiguity and improve execution quality.

  • Drive practical use of AI and automation to improve operations productivity, reduce manual toil, accelerate triage, improve ticket prioritization, and strengthen repeatability across GPU operations workflows.

  • Partner with observability and telemetry teams to improve infrastructure visibility, including RDMA telemetry, network fabric health, service health metrics, and operational dashboarding.

  • Lead continuous improvement efforts such as validation frameworks, version set validation, link flap analysis, and long-tail performance optimization.

  • Monitor and improve operational health across technologies such as RoCE, InfiniBand, and large-scale data center networks.

Qualifications / Experience

  • 5+ years of experience in technical program management, program operations, business operations, data analysis, infrastructure operations, or a related discipline.

  • Demonstrated ability to lead complex, cross-functional initiatives with measurable outcomes across technical, operations, business, and customer-facing stakeholders.

  • Strong operational background with experience building cadences, governance mechanisms, KPI reporting, incident/change processes, risk management processes, or readiness programs.

  • Strong written and verbal communication skills; comfortable synthesizing complex technical and operational information into executive updates, recommendations, and decisions.

  • A high degree of organization and ability to manage multiple competing priorities independently through ambiguity.

  • Experience identifying, measuring, and adjusting execution plans against key business, operational, reliability, or delivery metrics.

  • Advanced Excel skills, including pivots, lookups, conditional logic, data modeling, and financial or operational analysis.

  • Experience developing dashboards, automated reporting, or analytical tools that provide reliable business and operational visibility.

  • Working knowledge of PowerPoint, Jira, Confluence, and related collaboration or delivery management tools.

Preferred / Nice to Have

  • Experience with cloud infrastructure, AI/ML infrastructure, GPU operations, data center deployment, capacity planning, or large-scale platform operations.

  • Experience supporting large GPU fleets, distributed AI training or inference workloads, or performance-sensitive infrastructure environments.

  • Experience with incident management, root cause analysis, corrective and preventive action tracking, Change Review Board processes, or high-volume change governance.

  • Familiarity with observability, telemetry, RDMA, RoCE, InfiniBand, network fabric health, service health metrics, ticket/incident analytics, or operational dashboarding.

  • Finance, business planning, workforce planning, or operational readiness experience in a technology organization.

  • Track record of influencing senior business and technology leaders without relying on direct authority.

Disclaimer:

Certain U.S. based or U.S. customer or client-facing roles may be required to comply with applicable requirements, such as immunization/occupational health mandates, and/or drug testing requirements.

Range and benefit information provided in this posting are specific to the stated locations only

US: Hiring Range in USD from: $90,100 to $199,500 per annum. May be eligible for bonus and equity.

Oracle maintains broad salary ranges for its roles in order to account for variations in knowledge, skills, experience, market conditions and locations, as well as reflect Oracle's differing products, industries and lines of business.

Candidates are typically placed into the range based on the preceding factors as well as internal peer equity.

Oracle US offers a comprehensive benefits package which includes the following:

Medical, dental, and vision insurance, including expert medical opinion

Short term disability and long term disability

Life insurance and AD&D

Supplemental life insurance (Employee/Spouse/Child)

Health care and dependent care Flexible Spending Accounts

Pre-tax commuter and parking benefits

401(k) Savings and Investment Plan with company match

Paid time off: Flexible Vacation is provided to all eligible employees assigned to a salaried (non-overtime eligible) position. Accrued Vacation is provided to all other employees eligible for vacation benefits. For employees working at least 35 hours per week, the vacation accrual rate is 13 days annually for the first three years of employment and 18 days annually for subsequent years of employment. Vacation accrual is prorated for employees working between 20 and 34 hours per week. Employees working fewer than 20 hours per week are not eligible for vacation.

11 paid holidays

Paid sick leave: 72 hours of paid sick leave upon date of hire. Refreshes each calendar year. Unused balance will carry over each year up to a maximum cap of 112 hours.

Paid parental leave

Adoption assistance

Employee Stock Purchase Plan

Financial planning and group legal

Voluntary benefits including auto, homeowner and pet insurance

The role will generally accept applications for at least three calendar days from the posting date or as long as the job remains posted.

Career Level - IC4

About Us

Only Oracle brings together the data, infrastructure, applications, and expertise to power everything from industry innovations to life-saving care. And with AI embedded across our products and services, we help customers turn that promise into a better future for all. Discover your potential at a company leading the way in AI and cloud solutions that impact billions of lives.

True innovation starts when everyone is empowered to contribute. That's why we're committed to growing a workforce that promotes opportunities for all with competitive benefits that support our people with flexible medical, life insurance, and retirement options. We also encourage employees to give back to their communities through our volunteer programs.

We're committed to including people with disabilities at all stages of the employment process. If you require accessibility assistance or accommodation for a disability at any point, let us know by emailing View email address on jobs.institutedata.com or by calling View phone number on jobs.institutedata.com in the United States.

Oracle is an Equal Employment Opportunity Employer. All qualified applicants will receive consideration for employment without regard to race, color, religion, sex, national origin, sexual orientation, gender identity, disability and protected veterans' status, or any other characteristic protected by law. Oracle will consider for employment qualified applicants with arrest and conviction records pursuant to applicable law.

Vacancy posted 16 hours ago
Similar jobs that could be interesting for youBased on the Principal TPM -AI Infrastructure in Juneau, AK vacancy
  • $120k - $230k

     ...Summary The Solutions Engineer collaborates with account teams to assess customer data center environments and design tailored infrastructure solutions that align with business objectives. This role involves building technical relationships with OEMs, providing... 
    Suggested
    Work experience placement
    Work at office
    Worldwide
    Flexible hours

    SHI GmbH

    Juneau, AK
    2 days ago
  •  ...want to directly influence product quality and patient safety, you’ll love consulting at Parexel. Position Overview The Senior / Principal Regulatory Compliance Consultant serves as a high-level subject matter expert in QC Microbiology and aseptic sterile drug product... 
    Principal
    Remote work
    Worldwide

    PAREXEL

    Juneau, AK
    4 days ago
  • $185.1k - $335.3k

     ...strategy in ambiguous problem spaces, and lead cross-functional efforts spanning Mapping, Perception, Localization, Simulation, and Infrastructure. You will also mentor senior engineers and help raise the ML and CV bar across the organization. What You'll Do (... 
    Suggested
    Local area
    Remote work
    Work from home
    Relocation package
    Flexible hours

    General Motors

    Juneau, AK
    1 day ago
  • $84.63k - $112.84k

     ...digitally connect the world and shape the future. The Role The role of this position is to serve as a Linux and virtual infrastructure system administrator, for both on-premise and cloud based infrastructure. Primary functions are to design, deploy, and support... 
    Suggested
    Full time
    Temporary work
    Work from home

    Lumen

    Juneau, AK
    4 days ago
  • $108.78k - $137.35k

     ...Assistant Principal Location: Thunder Mountain Middle School Full Time Position: (1.0 FTE) Position Summary: The Juneau School District located in the capital city of Alaska is seeking an outstanding educational leader to serve as the Assistant Principal at Thunder... 
    Principal
    Full time
    Contract work

    Juneau-Douglas High School

    Juneau, AK
    2 days ago
  • $193.2k - $227k

     ...deliver a simple, understandable, and scalable product experience requires abstracting the complex interactions of underlying cloud infrastructure, networking, storage, and compute powering a stateful distributed system. As a Senior Product Manager on the Cloud... 
    Full time
    Remote work
    Flexible hours

    Confluent

    Juneau, AK
    3 days ago
  • $286.2k - $326.7k

    A leading financial technology firm is seeking a Senior Distinguished Data Engineer, responsible for influencing and driving modern technology solutions across the organization. Candidates should have over 9 years in data engineering, proficiency in AWS, and a Bachelor...
    Remote work

    Capital One

    Juneau, AK
    4 days ago
  •  ...Job Summary The Epic Analyst - Cogito Principal Trainer primary responsibility is to configure and provide functional and technical help for specific applications to business and clinical users. Part of the responsibility is to partner with end users to interpret the... 
    Principal
    Work experience placement
    Immediate start

    Baylor Scott & White Health

    Juneau, AK
    2 days ago
  • A leading financial services firm is seeking a Senior Manager for Data Science with a focus on Quantum Computing Research. The role will require a deep understanding of quantum computing principles and hands-on experience in developing quantum algorithms. The successful...
    Remote work

    Capital One

    Juneau, AK
    2 days ago
  •  ...GCI's Mgr Sr, Cloud Infrastructure & Orchestration will lead a team of engineers that provide critical cloud services for the Enterprise Cloud Platform (ECP), ensuring smooth operation, optimal performance, and security while maintaining business continuity and efficiency... 
    Work experience placement
    Immediate start
    Remote work
    Work from home
    Home office
    Shift work
    Weekend work

    GCI Communication Corp

    Juneau, AK
    21 hours ago
  • A financial services company in Juneau is searching for a Sr. Distinguished Software Engineer (Remote - Eligible) who will define future banking technologies. The role requires 9+ years in software engineering, with a strong focus on cloud computing and data architecture...
    Remote work

    Capital One

    Juneau, AK
    2 days ago
  • Parexel is looking for changemakers who are driven by curiosity, passion, and optimism. Come join us in keeping the Patient at the Heart of Everything We do. Seeking a Senior Manager of Networking to lead the strategy, operations, and financial management of Parexel...
    Remote work
    Flexible hours

    PAREXEL

    Juneau, AK
    21 hours ago
  • $120k - $150k

     ...assumed or assigned; it is not intended to be all-inclusive or limit the duties of the position. Purpose Summary *: * The Principal of Sales within Partnerships & Policy team will lead the development of new Connections Academy partnerships and shape a positive... 
    Principal
    Full time
    Local area
    Flexible hours

    Pearson

    Juneau, AK
    21 hours ago
  • $161.5k - $184.3k

    A leading financial services firm is seeking a Senior Marketing Analyst to translate complex data into actionable insights that drive marketing strategy and revenue growth. The ideal candidate will have at least 5 years of experience in marketing analytics and the ability...
    Remote work

    Capital One

    Juneau, AK
    2 days ago
  • A remote work platform is seeking a Data Entry Clerk for an entry-level, work-from-home position. This role offers flexibility, allowing you to work whenever you want while earning extra income. Ideal candidates should be self-motivated, enjoy data entry tasks, and have...
    Weekly pay
    Extra income
    Part time
    Remote work
    Work from home
    Flexible hours

    FocusGroupPanel

    Juneau, AK
    4 days ago
  • FocusGroupPanel is seeking a Remote Work From Home Data Entry Clerk for an entry-level position. Ideal candidates have strong typing skills and enjoy data entry tasks, working flexibly from home. There are no educational requirements, making it perfect for a diverse range...
    Full time
    Part time
    Remote work
    Work from home
    Flexible hours

    FocusGroupPanel

    Juneau, AK
    3 days ago
  • About the job Data Entry Operator | Junior (Remote) Important: You Will Receive An Email Within Next 2 Minutes After Applying , Check Your Inbox or Spam Folder For next steps. A Data Entry Clerk, is responsible for inputting data and making changes to existing data figures...
    Remote work

    Only Data Entry

    Juneau, AK
    2 days ago
  • Remote Work From Home Data Entry Clerk – Entry Level Position Work from home position for job seekers from virtually any work background who are interested in part-time, side gigs, micro jobs, remote telecommute jobs. We are looking for folks who want to earn weekly...
    Full time
    Part time
    Second job
    Remote work
    Work from home

    FocusGroupPanel

    Juneau, AK
    1 day ago
  • A remote work company is offering an entry-level Data Entry Clerk position for individuals looking to earn extra income from home. This role allows flexible working hours with tasks primarily involving data entry. Candidates are expected to type at least 25 words per minute...
    Extra income
    Remote work
    Flexible hours

    FocusGroupPanel

    Juneau, AK
    3 days ago
  • $144.7k - $261.3k

     ...a reality. Our team is developing national-scale, next-generation mapping systems from the ground up—combining robust backend infrastructure with intuitive, performant user interfaces. The Role We’re looking for a Fullstack Engineer with deep expertise in Frontend... 
    Work experience placement
    Local area
    Work from home
    Relocation package
    Flexible hours

    General Motors

    Juneau, AK
    21 hours ago
  • FocusGroupPanel is seeking a Remote Work From Home Data Entry Clerk for an entry-level position. This role allows you to work from virtually anywhere with a computer and internet access, making it ideal for anyone looking to earn income online. You can work flexibly, choosing...
    Remote work
    Work from home
    Flexible hours

    FocusGroupPanel

    Juneau, AK
    2 days ago
  • Remote Work From Home Data Entry Clerk for Entry Level Position Work at home position for job seekers from virtually any work background who are interested in part‑time, side gigs, micro jobs, work from home jobs and remote telecommute jobs. We're looking for folks who...
    Extra income
    Full time
    Part time
    Second job
    Immediate start
    Remote work
    Work from home

    FocusGroupPanel

    Juneau, AK
    4 days ago
  •  ...requirements and develop technical solutions for Customer review and consideration, and once approved, integrate into the customer's infrastructure. You will lead or participate in system configuration, tuning, and policy development. You will lead or participate in... 
    Flexible hours

    Trellix

    Juneau, AK
    3 days ago
  • $130k - $150k

     ...reason why diversity and inclusion are core to our business. Join Evolent for the mission. Stay for the culture. What You’ll Be Doing: Principal Product Solutions Architect, Specialty Product Management Role Overview The Principal Product Solutions Architect is a Director-... 
    Principal
    Temporary work
    Immediate start
    Flexible hours

    Evolent

    Juneau, AK
    1 day ago
  • $120k - $135k

     ...Engineering organization, you will be part of a team responsible for managing the large footprint of our application suite and cloud infrastructure - your role will be heavily network focused. We're redefining how we approach cloud infrastructure, networking, and... 
    Immediate start

    Evolent

    Juneau, AK
    1 day ago
  • $126.3k - $173.7k

     ...Become a part of our caring community The Insurance Product Management Principal manages insurance product offerings for each market and customer need. The Insurance Product Management Principal provides strategic guidance to functional team(s). The Supplemental... 
    Principal
    Full time
    Contract work
    Temporary work
    Apprenticeship
    Remote work

    Humana, Inc.

    Juneau, AK
    2 days ago
  • Job Title: Staff Software Engineer – ML Applications Location: Remote Experience Required: 8–12 years Job Summary We are looking for a Staff Software Engineer to lead the design and development of machine learning–powered applications. This role blends strong software ...
    Remote work

    Indotronix UK

    Juneau, AK
    3 days ago
  • $100k - $172.5k

     ...United States, Indianapolis, Indiana, United States {+ 23 more} Job Description: We are searching for the best talent for a Principal Product Security Engineer to be located in Danvers, MA or Raritan, NJ. Remote work options may be considered on a case-by-case basis... 
    Principal
    Full time
    Temporary work
    Work at office
    Local area
    Immediate start
    Remote work
    3 days per week

    Johnson & Johnson

    Juneau, AK
    2 days ago
  •  ...engineering principles. ESSENTIAL DUTIES AND RESPONSIBILITIES AT ALL LEVELS: Ensures the integrity of high availability network infrastructure to provide maximum performance in design, implementation, and support. Able to understand GCI standard engineering concepts... 
    Work experience placement
    Immediate start
    Remote work
    Work from home
    Home office
    Shift work
    Weekend work

    GCI Communication Corp

    Juneau, AK
    1 day ago
  •  ...ensuring enterprise adoption of Flink. Candidates should have over 8 years of product management experience and a solid understanding of cloud infrastructure and enterprise security. This is a remote position with competitive compensation including equity. #J-18808-Ljbffr... 
    Remote work

    Confluent

    Juneau, AK
    4 days ago

Do you want to receive more vacancies?

Subscribe and receive similar vacancies to Principal TPM -AI Infrastructure. Be the first to apply!