Sign up to access all features of our service.
  • Job search
  • Favorites
  • Create a CV
    New
  • Salaries
  • Subscriptions

Principal TPM -AI Infrastructure

$90.1k - $199.5k

Oracle

Job Description

The AI Infrastructure GPU Operations Team drives deployment planning, execution governance, operational readiness, reliability, and business rhythm for OCI's rapidly expanding GPU infrastructure portfolio. As Principal Technical Program Manager, you will lead cross-functional programs that connect engineering, platform, operations, business, finance, observability, SRE, network, and leadership teams across complex GPU operations initiatives.

You will own operating mechanisms for regional deployment readiness, GPU fleet health, milestone tracking, executive reporting, incident and change governance, risk management, and operational handoff across multiple concurrent GPU operations programs. This role requires strong program discipline, business analytics capability, and the ability to turn ambiguous technical and operational inputs into clear priorities, metrics, decisions, and action plans.

You will also improve the way the organization scales by strengthening dashboards, telemetry, documentation, onboarding, playbooks, repeatable processes, and the practical use of AI to improve operations productivity. The ideal candidate brings crisp communication, strong ownership, and pragmatic simplification to high-visibility GPU operations programs where disciplined execution, customer impact, and measurable reliability outcomes matter.

You are a structured, data-driven program leader who values simplicity, scalability, reliability, and clear operational mechanisms. You thrive in collaborative environments, communicate crisply with senior stakeholders, and drive consistent execution through ownership, metrics, and disciplined follow-through. You combine strategic clarity with enough technical and operational depth to help teams deliver reliable OCI AI Infrastructure GPU Operations while continuously improving the processes, telemetry, and automation that support it.

Travel: as needed for cross-site coordination, stakeholder alignment, and partner engagements.

Responsibilities

Key Responsibilities GPU Fleet Operations & Reliability

  • Drive availability and reliability of large-scale GPU fleets, identifying systemic issues and leading cross-functional recovery efforts.

  • Support operational readiness and performance of distributed AI training and inference workloads across multi-region GPU clusters.

  • Lead GPU fleet health reviews across current and next-generation hardware, including NVIDIA H200, B200, GB200/GB300 platforms and AMD Instinct MI300X, MI325X, MI350X, MI355X, and related platforms.

Program Leadership & Execution

  • Own end-to-end execution of critical AI Infrastructure GPU Operations programs, ensuring alignment with business priorities, customer needs, and operational risk signals.

  • Set and run weekly operating cadences and governance forums across multiple concurrent initiatives, ensuring clear ownership, timelines, dependencies, decision points, and committed actions.

  • Coordinate cross-functional delivery across engineering, platform, operations, business operations, finance, observability, SRE, network, and senior leadership stakeholders.

Incident, Change & Deployment Governance

  • Manage deployment governance, change review, readiness tracking, stakeholder handoff, and operational execution processes.

  • Establish and scale structured incident management mechanisms, improving root cause analysis, corrective and preventive actions, and follow-through on durable fixes.

  • Serve as a primary escalation point between engineering and operations teams, resolving priority conflicts and accelerating issue resolution.

  • Lead Change Review Board processes for high-volume change activity, minimizing change-related incidents and protecting service quality.

Business Planning, Metrics & Executive Reporting

  • Build, model, and maintain business planning inputs, financial forecasts, analytical views, and operating reports for AI Infrastructure GPU Operations programs.

  • Own executive-level reporting, including monthly business reviews, weekly operational KPIs, critical project updates, risks, dependencies, decisions, and mitigation plans.

  • Provide data-driven insights into infrastructure performance, operational risk, customer impact, and measurable program outcomes for senior leadership.

Cross-Functional & Stakeholder Engagement

  • Strengthen partnerships with hardware vendors, cloud platform teams, SRE, cloud engineering, network teams, and other internal stakeholders to improve issue resolution and operational efficiency.

  • Translate complex technical, operational, and business situations into accurate narratives, recommendations, and action plans for senior stakeholders.

  • Drive structured escalation and bug reporting mechanisms that reduce time-to-resolution for critical issues.

Operational Excellence, Optimization & AI Productivity

  • Create and maintain documentation, playbooks, onboarding materials, runbooks, and repeatable processes that reduce ambiguity and improve execution quality.

  • Drive practical use of AI and automation to improve operations productivity, reduce manual toil, accelerate triage, improve ticket prioritization, and strengthen repeatability across GPU operations workflows.

  • Partner with observability and telemetry teams to improve infrastructure visibility, including RDMA telemetry, network fabric health, service health metrics, and operational dashboarding.

  • Lead continuous improvement efforts such as validation frameworks, version set validation, link flap analysis, and long-tail performance optimization.

  • Monitor and improve operational health across technologies such as RoCE, InfiniBand, and large-scale data center networks.

Qualifications / Experience

  • 5+ years of experience in technical program management, program operations, business operations, data analysis, infrastructure operations, or a related discipline.

  • Demonstrated ability to lead complex, cross-functional initiatives with measurable outcomes across technical, operations, business, and customer-facing stakeholders.

  • Strong operational background with experience building cadences, governance mechanisms, KPI reporting, incident/change processes, risk management processes, or readiness programs.

  • Strong written and verbal communication skills; comfortable synthesizing complex technical and operational information into executive updates, recommendations, and decisions.

  • A high degree of organization and ability to manage multiple competing priorities independently through ambiguity.

  • Experience identifying, measuring, and adjusting execution plans against key business, operational, reliability, or delivery metrics.

  • Advanced Excel skills, including pivots, lookups, conditional logic, data modeling, and financial or operational analysis.

  • Experience developing dashboards, automated reporting, or analytical tools that provide reliable business and operational visibility.

  • Working knowledge of PowerPoint, Jira, Confluence, and related collaboration or delivery management tools.

Preferred / Nice to Have

  • Experience with cloud infrastructure, AI/ML infrastructure, GPU operations, data center deployment, capacity planning, or large-scale platform operations.

  • Experience supporting large GPU fleets, distributed AI training or inference workloads, or performance-sensitive infrastructure environments.

  • Experience with incident management, root cause analysis, corrective and preventive action tracking, Change Review Board processes, or high-volume change governance.

  • Familiarity with observability, telemetry, RDMA, RoCE, InfiniBand, network fabric health, service health metrics, ticket/incident analytics, or operational dashboarding.

  • Finance, business planning, workforce planning, or operational readiness experience in a technology organization.

  • Track record of influencing senior business and technology leaders without relying on direct authority.

Disclaimer:

Certain U.S. based or U.S. customer or client-facing roles may be required to comply with applicable requirements, such as immunization/occupational health mandates, and/or drug testing requirements.

Range and benefit information provided in this posting are specific to the stated locations only

US: Hiring Range in USD from: $90,100 to $199,500 per annum. May be eligible for bonus and equity.

Oracle maintains broad salary ranges for its roles in order to account for variations in knowledge, skills, experience, market conditions and locations, as well as reflect Oracle's differing products, industries and lines of business.

Candidates are typically placed into the range based on the preceding factors as well as internal peer equity.

Oracle US offers a comprehensive benefits package which includes the following:

Medical, dental, and vision insurance, including expert medical opinion

Short term disability and long term disability

Life insurance and AD&D

Supplemental life insurance (Employee/Spouse/Child)

Health care and dependent care Flexible Spending Accounts

Pre-tax commuter and parking benefits

401(k) Savings and Investment Plan with company match

Paid time off: Flexible Vacation is provided to all eligible employees assigned to a salaried (non-overtime eligible) position. Accrued Vacation is provided to all other employees eligible for vacation benefits. For employees working at least 35 hours per week, the vacation accrual rate is 13 days annually for the first three years of employment and 18 days annually for subsequent years of employment. Vacation accrual is prorated for employees working between 20 and 34 hours per week. Employees working fewer than 20 hours per week are not eligible for vacation.

11 paid holidays

Paid sick leave: 72 hours of paid sick leave upon date of hire. Refreshes each calendar year. Unused balance will carry over each year up to a maximum cap of 112 hours.

Paid parental leave

Adoption assistance

Employee Stock Purchase Plan

Financial planning and group legal

Voluntary benefits including auto, homeowner and pet insurance

The role will generally accept applications for at least three calendar days from the posting date or as long as the job remains posted.

Career Level - IC4

About Us

Only Oracle brings together the data, infrastructure, applications, and expertise to power everything from industry innovations to life-saving care. And with AI embedded across our products and services, we help customers turn that promise into a better future for all. Discover your potential at a company leading the way in AI and cloud solutions that impact billions of lives.

True innovation starts when everyone is empowered to contribute. That's why we're committed to growing a workforce that promotes opportunities for all with competitive benefits that support our people with flexible medical, life insurance, and retirement options. We also encourage employees to give back to their communities through our volunteer programs.

We're committed to including people with disabilities at all stages of the employment process. If you require accessibility assistance or accommodation for a disability at any point, let us know by emailing View email address on jobs.institutedata.com or by calling View phone number on jobs.institutedata.com in the United States.

Oracle is an Equal Employment Opportunity Employer. All qualified applicants will receive consideration for employment without regard to race, color, religion, sex, national origin, sexual orientation, gender identity, disability and protected veterans' status, or any other characteristic protected by law. Oracle will consider for employment qualified applicants with arrest and conviction records pursuant to applicable law.

Vacancy posted 6 hours ago
Similar jobs that could be interesting for youBased on the Principal TPM -AI Infrastructure in Wyoming, MI vacancy
  • $85.39k - $116.98k

     ...Electronic Data Interchange (EDI) processing pipelines into the new data layer Implement and maintain CI/CD pipelines for data infrastructure using AWS-native tooling (CodePipeline, CodeBuild, and CodeDeploy) Apply Infrastructure as Code (IaC) practices using AWS... 
    Suggested
    Full time
    Remote work

    Syms Strategic Group, LLC (SSG)

    Grand Rapids, MI
    3 days ago
  •  ...will own the systems that make it trustworthy, fast, and useful. What You'll Do Design, build, and operate the core data infrastructure: data lake, warehouse, orchestration, observability, and governance, using declarative configuration and infrastructure as code... 
    Suggested
    Contract work
    Remote work
    Flexible hours
    Night shift

    CertifID LLC

    Grand Rapids, MI
    1 day ago
  • $24 per hour

     ...depriving wrongdoers of proceeds from their crime and impacting the infrastructure of criminal enterprises. FSA Federal (FSA) is focused on...  ...States Attorney's Office (USAO) serves as the nation's principal litigators under the direction of the Attorney General. The Asset... 
    Suggested
    Hourly pay
    Full time
    Temporary work
    Work at office
    Local area
    Remote work
    Flexible hours

    FSA Federal

    Grand Rapids, MI
    8 days ago
  • Kelloggsville Public Schools Leadership Opportunity Kelloggsville Public Schools is seeking a strong, visionary leader in education and administration. Education: M.A. in Educational Leadership - School Admin Certificate required; Three to five years of successful...
    Principal

    Kent ISD

    Grand Rapids, MI
    11 hours ago
  •  ...Participate in agile ceremonies and follow agile cadence to build, test and deliver solutions Driving engagement with ITS Security and Infrastructure teams to ensure secure development and deployment of solutions. Interfaces with the Product Manager and IT partners at the... 
    Suggested
    Remote work

    RIT Solutions

    Grand Rapids, MI
    1 day ago
  •  ...data engineering experience, focusing on architecting cloud data platforms. Responsibilities include modernizing healthcare data infrastructure and ensuring optimal performance of SQL queries. Candidates should possess a Bachelor's degree in a relevant field and have... 
    Full time
    Remote work

    Syms Strategic Group, LLC (SSG)

    Grand Rapids, MI
    11 hours ago
  •  ...design solutions. ~ Mentor and coach other members of the agile and/or Run team. ~ Drive engagement with ITS Security and Infrastructure teams to ensure secure development and deployment of solutions. ~ Interface with the Product Manager and IT partners at the Program... 
    Contract work

    Argyle Infotech

    Grand Rapids, MI
    2 days ago
  •  ...architectural runway. Mentoring and coaching other members of the agile and/or Run team. Driving engagement with ITS Security and Infrastructure teams to ensure secure development and deployment of solutions. Interfaces with the Product Manager and IT partners at the... 
    Work experience placement

    Samprasoft

    Grand Rapids, MI
    3 days ago
  •  ...We are seeking a Principal Data Engineer to drive scalable, business-focused data solutions that power insight-driven decision-making across the enterprise. This role is ideal for someone who combines deep technical expertise in modern data platforms with the ability... 
    Principal
    Local area
    Remote work

    Cornerstone OnDemand, Inc.

    Wyoming, MI
    1 day ago
  •  ...Position Type: Administration/Principal Date Posted: 5/11/2026 Location: El Puente -Spanish Immersion Closing Date: until filled EL PUENTE ELEMENTARY SCHOOL PRINCIPAL SPANISH IMMERSION SCHOOL K-6 Qualifications: Master's degree from... 
    Principal

    Jenison Public Schools

    Jenison, MI
    1 day ago
  •  ...US-Signal in Grand Rapids, Michigan is seeking a Sr. R&D Cloud Engineer to design and build cloud infrastructure for the OpenCloud platform. In this hands-on role, you will lead proofs of concept, create automation-first platforms, and architect resilient data protection... 

    US Signal

    Grand Rapids, MI
    11 days ago
  •  ...Principal At National Heritage Academies (NHA), the Principal is dedicated to achieving the purpose of "transforming the lives of students and enriching communities by delivering high-quality educational choice to families". Properly executed, the leadership of the... 
    Principal
    Local area

    NHA PLLC

    Grand Rapids, MI
    2 days ago
  •  ...A Senior IT Director leads an organization's technology strategy, infrastructure, and operations, aligning IT initiatives with business goals. They manage IT teams, budgets, and vendors while overseeing cybersecurity and system performance. Key responsibilities include... 

    28 Technologies LLC

    Grand Rapids, MI
    5 days ago
  •  ...depriving wrongdoers of proceeds from their crime and impacting the infrastructure of criminal enterprises. FSA Federal (FSA) is focused on...  ...States Attorney's Office (USAO) serves as the nation's principal litigators under the direction of the Attorney General. The Asset... 
    Temporary work
    Work at office
    Flexible hours

    Compass Strategy Solutions LLC

    Grand Rapids, MI
    3 days ago
  • $84.63k - $112.84k

     ...digitally connect the world and shape the future. The Role The role of this position is to serve as a Linux and virtual infrastructure system administrator, for both on-premise and cloud based infrastructure. Primary functions are to design, deploy, and support... 
    Full time
    Temporary work
    Work from home

    Lumen

    Grand Rapids, MI
    11 hours ago
  •  ...Job Title: MIDDLE SCHOOL PRINCIPAL Serve as the instructional and administrative leader of East Grand Rapids Middle School, providing guidance and direction to ensure high-quality teaching and learning. Guide the instructional program and oversee school operations... 
    Principal
    Contract work
    Flexible hours

    Egrps

    Grand Rapids, MI
    11 hours ago
  • $130k - $150k

     ...reason why diversity and inclusion are core to our business. Join Evolent for the mission. Stay for the culture. What You’ll Be Doing: Principal Product Solutions Architect, Specialty Product Management Role Overview The Principal Product Solutions Architect is a Director-... 
    Principal
    Temporary work
    Immediate start
    Flexible hours

    Evolent

    Wyoming, MI
    2 days ago
  •  ...Principal - Elementary The person serving in this position will be the educational and administrative leader of the school. Achieving academic excellence requires that the principal work collaboratively to lead and nurture all members of the school staff and to communicate... 
    Principal

    Kent ISD

    Grand Rapids, MI
    1 day ago
  •  ...Principal Cardiology Ultrasound Specialist At Boston Scientific, we'll give you the opportunity to harness all that's within you by working in teams of diverse and high-performing employees, tackling some of the most important health industry challenges. With access... 
    Principal

    Intracept by Boston Scientific

    Grand Rapids, MI
    2 days ago
  • Job Title This job post is for a position in the Research Innovation & Development domain. Activities Essential Duties And Responsibilities: Knowledge and practice of Hutchinson's Health, Safety, and Environmental policies, procedures, and requirements...
    Temporary work
    Work at office
    Local area
    Night shift

    Direct Energie

    Grand Rapids, MI
    1 day ago
  •  ...Principal- Burton Middle School The person serving in this position will be the educational and administrative leader of the school. Achieving academic excellence requires that the principal work collaboratively to lead and nurture all members of the school staff and... 
    Principal
    Local area

    Kent ISD

    Grand Rapids, MI
    1 day ago
  • $55k - $121k

     ...our employees feel respected, valued and have an opportunity to contribute to the company's success. The FINRA Supervisory Principal Senior reports into PNC Investments, which is a broker-dealer subsidiary of PNC Bank. This role works with Compliance, Central... 
    Principal
    Full time
    Temporary work
    Part time
    Work experience placement
    Work at office

    PNC Financial Services Group

    Grand Rapids, MI
    1 day ago
  • Senior Data Engineer Our Retail Digital team partner is seeking a Senior Data Engineer with experience migrating on prem SQL data to no-SQL cloud solutions. The Senior Data Engineer designs, develops, and tests the cloud solution and works closely with a wider development...
    Work experience placement

    Samprasoft

    Grand Rapids, MI
    3 days ago
  •  ...Assistant Principal- Southwest Middle/ Highschool The Assistant Principal serves as a school leader and is committed to and responsible for assisting the building principal in fulfilling duties related to the daily supervision of school operational and instructional... 
    Principal
    Work at office

    Kent ISD

    Grand Rapids, MI
    1 day ago
  • $60 per hour

    Role: Data Engineer Location: Grand rapids, MI Job Type: Contract Rate: $60/hr Role Description: 1. Azure Data Integration Services * Azure Data Factory (ADF) Troubleshoot pipeline failures, debug triggers, resolve integration runtime issues, ...
    Contract work

    Diverse Lynx

    Grand Rapids, MI
    11 hours ago
  •  ...High School Assistant Principal Northern High Reports to: Building Principal Salary: Commensurate with Administrative Agreement Schedule: Starting with the 2026-2027 School Year Job Summary: The role of the assistant principal is to serve as a "leader of... 
    Principal
    Internship
    Work at office
    Weekend work
    Afternoon shift
    Early shift

    Forest Hills Public Schools

    Grand Rapids, MI
    5 days ago
  • Essential Duties and Responsibilities: - Provide thought leadership for the organization's technical sourcing function, developing and executing long-term strategies to meet current and future talent needs, particularly for complex and niche technical roles requiring...
    Principal
    Minimum wage
    Contract work
    Temporary work
    Work experience placement

    MAXIMUS

    Grand Rapids, MI
    2 days ago
  • Data Engineer Location: Hybrid Onsite in Grand Rapids MI Duration: 12+ months Interview Type: PS + Video Required Skills: Must Have: Strong Data Engineering experience w/ Azure Data Factory, Log Analytics, Databricks Experience in an Agile/SAFe environment ...

    Syntricate Technologies

    Grand Rapids, MI
    1 day ago
  • Earn at Home by Taking Polls – Data Entry Clerk – Work at Home & Part Time (Side Gig) We are looking for people who are motivated to participate in paid research across the country and local areas. Join this Work from Home USA Market Research Panel Today. You have two ...
    Extra income
    Full time
    Temporary work
    Part time
    Second job
    Local area
    Remote work
    Work from home
    Flexible hours

    TowardJobs

    Grand Rapids, MI
    11 hours ago
  • $3,000 per month

    Work From Home, Entry Level Data Entry Clerk As A Research Participant We are looking for people who want to work remotely from home. You'll need an Internet connection and a mobile device or computer. We need folks who want to do tasks, micro tasks, work at home opinion...
    Extra income
    Part time
    Immediate start
    Remote work
    Work from home

    FocusGroupPanel

    Wyoming, MI
    11 hours ago

Do you want to receive more vacancies?

Subscribe and receive similar vacancies to Principal TPM -AI Infrastructure. Be the first to apply!