Sign up to access all features of our service.
  • Job search
  • Favorites
  • Create a CV
    New
  • Salaries
  • Subscriptions

Principal TPM -AI Infrastructure

$90.1k - $199.5k

Oracle

Job Description

The AI Infrastructure GPU Operations Team drives deployment planning, execution governance, operational readiness, reliability, and business rhythm for OCI's rapidly expanding GPU infrastructure portfolio. As Principal Technical Program Manager, you will lead cross-functional programs that connect engineering, platform, operations, business, finance, observability, SRE, network, and leadership teams across complex GPU operations initiatives.

You will own operating mechanisms for regional deployment readiness, GPU fleet health, milestone tracking, executive reporting, incident and change governance, risk management, and operational handoff across multiple concurrent GPU operations programs. This role requires strong program discipline, business analytics capability, and the ability to turn ambiguous technical and operational inputs into clear priorities, metrics, decisions, and action plans.

You will also improve the way the organization scales by strengthening dashboards, telemetry, documentation, onboarding, playbooks, repeatable processes, and the practical use of AI to improve operations productivity. The ideal candidate brings crisp communication, strong ownership, and pragmatic simplification to high-visibility GPU operations programs where disciplined execution, customer impact, and measurable reliability outcomes matter.

You are a structured, data-driven program leader who values simplicity, scalability, reliability, and clear operational mechanisms. You thrive in collaborative environments, communicate crisply with senior stakeholders, and drive consistent execution through ownership, metrics, and disciplined follow-through. You combine strategic clarity with enough technical and operational depth to help teams deliver reliable OCI AI Infrastructure GPU Operations while continuously improving the processes, telemetry, and automation that support it.

Travel: as needed for cross-site coordination, stakeholder alignment, and partner engagements.

Responsibilities

Key Responsibilities GPU Fleet Operations & Reliability

  • Drive availability and reliability of large-scale GPU fleets, identifying systemic issues and leading cross-functional recovery efforts.

  • Support operational readiness and performance of distributed AI training and inference workloads across multi-region GPU clusters.

  • Lead GPU fleet health reviews across current and next-generation hardware, including NVIDIA H200, B200, GB200/GB300 platforms and AMD Instinct MI300X, MI325X, MI350X, MI355X, and related platforms.

Program Leadership & Execution

  • Own end-to-end execution of critical AI Infrastructure GPU Operations programs, ensuring alignment with business priorities, customer needs, and operational risk signals.

  • Set and run weekly operating cadences and governance forums across multiple concurrent initiatives, ensuring clear ownership, timelines, dependencies, decision points, and committed actions.

  • Coordinate cross-functional delivery across engineering, platform, operations, business operations, finance, observability, SRE, network, and senior leadership stakeholders.

Incident, Change & Deployment Governance

  • Manage deployment governance, change review, readiness tracking, stakeholder handoff, and operational execution processes.

  • Establish and scale structured incident management mechanisms, improving root cause analysis, corrective and preventive actions, and follow-through on durable fixes.

  • Serve as a primary escalation point between engineering and operations teams, resolving priority conflicts and accelerating issue resolution.

  • Lead Change Review Board processes for high-volume change activity, minimizing change-related incidents and protecting service quality.

Business Planning, Metrics & Executive Reporting

  • Build, model, and maintain business planning inputs, financial forecasts, analytical views, and operating reports for AI Infrastructure GPU Operations programs.

  • Own executive-level reporting, including monthly business reviews, weekly operational KPIs, critical project updates, risks, dependencies, decisions, and mitigation plans.

  • Provide data-driven insights into infrastructure performance, operational risk, customer impact, and measurable program outcomes for senior leadership.

Cross-Functional & Stakeholder Engagement

  • Strengthen partnerships with hardware vendors, cloud platform teams, SRE, cloud engineering, network teams, and other internal stakeholders to improve issue resolution and operational efficiency.

  • Translate complex technical, operational, and business situations into accurate narratives, recommendations, and action plans for senior stakeholders.

  • Drive structured escalation and bug reporting mechanisms that reduce time-to-resolution for critical issues.

Operational Excellence, Optimization & AI Productivity

  • Create and maintain documentation, playbooks, onboarding materials, runbooks, and repeatable processes that reduce ambiguity and improve execution quality.

  • Drive practical use of AI and automation to improve operations productivity, reduce manual toil, accelerate triage, improve ticket prioritization, and strengthen repeatability across GPU operations workflows.

  • Partner with observability and telemetry teams to improve infrastructure visibility, including RDMA telemetry, network fabric health, service health metrics, and operational dashboarding.

  • Lead continuous improvement efforts such as validation frameworks, version set validation, link flap analysis, and long-tail performance optimization.

  • Monitor and improve operational health across technologies such as RoCE, InfiniBand, and large-scale data center networks.

Qualifications / Experience

  • 5+ years of experience in technical program management, program operations, business operations, data analysis, infrastructure operations, or a related discipline.

  • Demonstrated ability to lead complex, cross-functional initiatives with measurable outcomes across technical, operations, business, and customer-facing stakeholders.

  • Strong operational background with experience building cadences, governance mechanisms, KPI reporting, incident/change processes, risk management processes, or readiness programs.

  • Strong written and verbal communication skills; comfortable synthesizing complex technical and operational information into executive updates, recommendations, and decisions.

  • A high degree of organization and ability to manage multiple competing priorities independently through ambiguity.

  • Experience identifying, measuring, and adjusting execution plans against key business, operational, reliability, or delivery metrics.

  • Advanced Excel skills, including pivots, lookups, conditional logic, data modeling, and financial or operational analysis.

  • Experience developing dashboards, automated reporting, or analytical tools that provide reliable business and operational visibility.

  • Working knowledge of PowerPoint, Jira, Confluence, and related collaboration or delivery management tools.

Preferred / Nice to Have

  • Experience with cloud infrastructure, AI/ML infrastructure, GPU operations, data center deployment, capacity planning, or large-scale platform operations.

  • Experience supporting large GPU fleets, distributed AI training or inference workloads, or performance-sensitive infrastructure environments.

  • Experience with incident management, root cause analysis, corrective and preventive action tracking, Change Review Board processes, or high-volume change governance.

  • Familiarity with observability, telemetry, RDMA, RoCE, InfiniBand, network fabric health, service health metrics, ticket/incident analytics, or operational dashboarding.

  • Finance, business planning, workforce planning, or operational readiness experience in a technology organization.

  • Track record of influencing senior business and technology leaders without relying on direct authority.

Disclaimer:

Certain U.S. based or U.S. customer or client-facing roles may be required to comply with applicable requirements, such as immunization/occupational health mandates, and/or drug testing requirements.

Range and benefit information provided in this posting are specific to the stated locations only

US: Hiring Range in USD from: $90,100 to $199,500 per annum. May be eligible for bonus and equity.

Oracle maintains broad salary ranges for its roles in order to account for variations in knowledge, skills, experience, market conditions and locations, as well as reflect Oracle's differing products, industries and lines of business.

Candidates are typically placed into the range based on the preceding factors as well as internal peer equity.

Oracle US offers a comprehensive benefits package which includes the following:

Medical, dental, and vision insurance, including expert medical opinion

Short term disability and long term disability

Life insurance and AD&D

Supplemental life insurance (Employee/Spouse/Child)

Health care and dependent care Flexible Spending Accounts

Pre-tax commuter and parking benefits

401(k) Savings and Investment Plan with company match

Paid time off: Flexible Vacation is provided to all eligible employees assigned to a salaried (non-overtime eligible) position. Accrued Vacation is provided to all other employees eligible for vacation benefits. For employees working at least 35 hours per week, the vacation accrual rate is 13 days annually for the first three years of employment and 18 days annually for subsequent years of employment. Vacation accrual is prorated for employees working between 20 and 34 hours per week. Employees working fewer than 20 hours per week are not eligible for vacation.

11 paid holidays

Paid sick leave: 72 hours of paid sick leave upon date of hire. Refreshes each calendar year. Unused balance will carry over each year up to a maximum cap of 112 hours.

Paid parental leave

Adoption assistance

Employee Stock Purchase Plan

Financial planning and group legal

Voluntary benefits including auto, homeowner and pet insurance

The role will generally accept applications for at least three calendar days from the posting date or as long as the job remains posted.

Career Level - IC4

About Us

Only Oracle brings together the data, infrastructure, applications, and expertise to power everything from industry innovations to life-saving care. And with AI embedded across our products and services, we help customers turn that promise into a better future for all. Discover your potential at a company leading the way in AI and cloud solutions that impact billions of lives.

True innovation starts when everyone is empowered to contribute. That's why we're committed to growing a workforce that promotes opportunities for all with competitive benefits that support our people with flexible medical, life insurance, and retirement options. We also encourage employees to give back to their communities through our volunteer programs.

We're committed to including people with disabilities at all stages of the employment process. If you require accessibility assistance or accommodation for a disability at any point, let us know by emailing View email address on jobs.institutedata.com or by calling View phone number on jobs.institutedata.com in the United States.

Oracle is an Equal Employment Opportunity Employer. All qualified applicants will receive consideration for employment without regard to race, color, religion, sex, national origin, sexual orientation, gender identity, disability and protected veterans' status, or any other characteristic protected by law. Oracle will consider for employment qualified applicants with arrest and conviction records pursuant to applicable law.

Vacancy posted 1 day ago
Similar jobs that could be interesting for youBased on the Principal TPM -AI Infrastructure in Columbus, OH vacancy
  • $130.56k - $208.9k

     ...Responsible for leading the engineering of various and complex aspects of both physical and cloud hosted data center network infrastructure. Lead the analysis of current and future requirements to effectively design data center infrastructure, networking and telephony... 
    Principal
    Work at office
    Local area
    Remote work
    Flexible hours
    2 days per week

    Blue Cross and Blue Shield of North Carolina

    Columbus, OH
    2 days ago
  •  ...Job Title: Principal Engineer I - Lead Data Engineer Location: Block 23 What you'll do: As a Principal Data Engineer...  ...with the capability model across IT and the applications and infrastructures available for engagement in solutioning across the bank to... 
    Principal

    Western Alliance Bank

    Columbus, OH
    5 days ago
  •  ...the most creative people in the world to solve problems that matter. Autodesk is seeking a Senior ML Engineer, ML Systems and Infrastructure to design and scale the systems that enable machine learning across research and product development. You will help build the... 
    Suggested
    Temporary work
    For contractors
    Remote work

    Autodesk

    Columbus, OH
    2 days ago
  •  ...We are seeking a Principal Data Engineer to drive scalable, business-focused data solutions that power insight-driven decision-making across the enterprise. This role is ideal for someone who combines deep technical expertise in modern data platforms with the ability... 
    Principal
    Local area
    Remote work

    Cornerstone OnDemand

    Columbus, OH
    3 days ago
  •  ...new positions become available. Columbus is looking for a Principal Technical Architect, D365 (CRM/CE) to join our growing team....  ...support as needed. Work with customers and internal teams on infrastructure architecture support. Provide and/or develop detailed... 
    Principal
    Full time
    Temporary work
    Remote work

    Tridea Partners

    Columbus, OH
    2 days ago
  • $157k - $197k

     ...via application of data/programming/models/analytics to support sales outreach and optimization. The Mortgage Sales & Strategy Principal Data Scientist will be responsible for generating, distributing, assessing and optimizing data-driven sales leads provided to virtual... 
    Principal
    Local area
    Remote work
    Monday to Friday
    Flexible hours
    Shift work

    Citizens Financial Group, Inc.

    Columbus, OH
    2 days ago
  • $130k - $150k

     ...reason why diversity and inclusion are core to our business. Join Evolent for the mission. Stay for the culture. What You’ll Be Doing: Principal Product Solutions Architect, Specialty Product Management Role Overview The Principal Product Solutions Architect is a Director-... 
    Principal
    Temporary work
    Immediate start
    Flexible hours

    Evolent

    Columbus, OH
    4 days ago
  • $96.6k - $160k

     ...opportunities and focus on continued employee development. AWS Infrastructure Services owns the design, planning, delivery, and operation of...  .... Foresee and manage internal Technical Program Manager's (TPM's) expectation regarding project-specific cost and schedules... 
    Flexible hours

    Amazon

    Columbus, OH
    1 day ago
  •  ...Job Title: Principal Engineer I - Full Stack Developer Location: Block 23 What you'll do: We are looking for a highly...  ..., including CI/CD pipelines, automated security testing, and infrastructure-as-code deployment. Integrate advanced SDLC (Software... 
    Principal

    Western Alliance Bank

    Columbus, OH
    4 days ago
  •  ...Lead Data Scientist The Lead Data Scientist contributes to building and developing the organization's data infrastructure and supports the senior leadership with insights, management reports, and analysis for decision-making processes. Duties and Responsibilities... 
    Work experience placement

    Huntington

    Columbus, OH
    3 days ago
  •  ...Principal Architect III (Senior) CLOCS Architect Genesis10 is currently seeking a Principal Architect III (Senior) CLOCS Architect...  ...software configurations Consult with application or infrastructure development projects to fit systems or infrastructure to architecture... 
    Principal
    Hourly pay
    Permanent employment
    Contract work
    Work at office

    Genesis10

    Columbus, OH
    4 days ago
  •  ...PurviewandForcepoint DLPpolicies, rulesets, and workflows across an enterprise DoD environment ~ ProvideTier 3engineering support for DLP infrastructure including fault isolation, root cause analysis, and incident remediation ~ Support the integration of DLP solutions with... 
    Contract work
    Work experience placement
    Remote work

    IP-Plus Consulting, Inc.

    Columbus, OH
    5 days ago
  • $140k - $160k

     ...leadership team with a wealth of industry experience and guided by a consistent philosophy, Carrington maintains the necessary infrastructure to ensure stability and maximize value during any market cycle. We hope you'll consider joining our growing team of uniquely talented... 
    Work experience placement
    Remote work
    Work from home

    Carrington

    Columbus, OH
    5 days ago
  • $110k - $140k

     ...T&M Associates is looking for a Principal Water Engineer to join our growing Columbus, Ohio office. This is a hybrid role, offering...  ...planning and design for municipal and industrial water/wastewater infrastructure projects Manage budgets, schedules, deliverables, and... 
    Principal
    Remote work
    Flexible hours

    TM Associates Inc

    Columbus, OH
    4 days ago
  •  ...Senior Director, Principal Gifts About the Company Philanthropic organization supporting Indigenous culture & individuals Industry Non-Profit Organization Management Type Non Profit Founded 2017 Employees 11-50 Categories ~ Non-Profit & Philanthropy... 
    Principal

    Confidential

    Worthington, OH
    22 hours ago
  • $185k - $237.5k

     ...Principal Product Marketing Manager Circle is one of the world's leading internet financial platform companies, building the foundation...  ...assets, payment applications, and programmable blockchain infrastructure. Circle's platform includes the world's largest regulated... 
    Principal
    Remote work
    Flexible hours

    Circle

    Columbus, OH
    3 days ago
  •  ...# Experience: Requires 7+ years of IT/data experience, with a proven track record of architecting, building, and scaling data infrastructure. # Technical Proficiency: Must have a strong "hands-on" background in SQL, Python, cloud data platforms (e.g., Azure, AWS),... 
    Full time
    Work at office
    Local area
    Monday to Friday

    New River Electrical Corporation

    Worthington, OH
    1 day ago
  •  ...governance and strategy to reporting, data science and machine learning. We have a strong partnership with Technology, which provides cutting edge data and analytics infrastructure. The team powers Chase with insights to create the best customer and business outcomes.... 

    Chase

    Columbus, OH
    3 days ago
  •  ...innovative team. As a Data Engineer, you will be responsible for designing, building, and maintaining scalable data pipelines and infrastructure to support our data-driven initiatives. You will collaborate closely with cross-functional teams to ensure the availability,... 

    Purple Drive

    Columbus, OH
    3 days ago
  •  ...Hi, Hope you are doing well. Title: F5 Engineer with Data background ( Cisco data network infrastructure ) Location: Columbus OH (Onsite only) Duration: Contract In-person Interview Mandatory Job description Relevant... 
    Contract work

    United Software Group

    Columbus, OH
    2 days ago
  •  ...particular outcome Experience with SDLC, waterfall and agile delivery; DevOps and data solution testing including CI/CD, and data infrastructure including cloud Ability to exhibit thought leadership for innovating toward the next evolution of technology products,... 
    Contract work

    Suncap Technology

    Columbus, OH
    3 days ago
  • Position Type: Administration/Administrator Date Posted: 5/5/2026 Location: Middle School Central Date Available: 08/01/2026 Closing Date: 06/30/2026 District: Groveport Madison Attachment(s): ~ Assistant Principal - MSC.pdf
    Principal

    Educational Service Center of Central Ohio

    Columbus, OH
    2 days ago
  •  ...systems. We love solving problems and providing creative solutions for our clients. Cloud Data Engineers leverage the client's cloud infrastructure to deliver this value today and to scale for the future. We enjoy a collaborative environment and have many opportunities to... 
    Work at office
    Remote work
    Visa sponsorship
    Work visa
    Flexible hours

    CapTech Consulting

    Columbus, OH
    3 days ago
  • $124k - $155k

     ...across the organization. This is a critical technical role that bridges data engineering and analytics, ensuring that our data infrastructure is scalable, reliable, and optimized for analytics and business intelligence. This is a remote position open to applicants authorized... 
    Remote work

    McGraw-Hill Education

    Columbus, OH
    1 day ago
  • $109k - $182.4k

     ...implement, and enhance security controls that protect sensitive data across our enterprise. You work closely with cybersecurity, infrastructure, and application teams to embed security into every stage of the technology lifecycle. Your work enables us to safeguard client... 
    Temporary work
    H1b
    Work at office
    Monday to Friday

    BentoBox

    Columbus, OH
    3 days ago
  • $116k - $140k

     ...the posting end date. Job Summary The Grant Consultant Principal - Federal Purchasing Compliance Consultant serves as AEP's subject...  .... Other Must Haves: Experience supporting DOE-funded infrastructure projects or other federal grant/loan programs. Familiarity... 
    Principal
    Contract work
    For contractors

    American Electric Power

    Columbus, OH
    1 day ago
  • $119k - $160.65k

     ...Job Description Summary The Principal Account Executive is an enterprise software sales professional who sells a platform of software...  .... Essential Duties & Responsibilities Sell a portfolio of infrastructure and application software that optimizes and modernizes enterprise... 
    Principal
    Worldwide

    Rocket Software

    Columbus, OH
    1 day ago
  •  ...Cisco Network Infrastructure Engineer Shenandoah Telecommunications Company ("Shentel") specializes in providing High-Speed Internet and other telecommunications services to customers in the Mid-Atlantic United States. We focus on rural communities, which are often... 
    Principal
    Local area

    Shentel

    Columbus, OH
    1 day ago
  •  ...We are seeking a high-impact Assistant Principal to join Educational Solutions in Columbus, Ohio—an opportunity to lead work that truly shapes the future of students, schools, and communities. Assistant Principals serve at the heart of teaching and learning, acting as... 
    Principal

    Educational Solutions Company

    Columbus, OH
    3 days ago
  •  ...We are seeking a transformational High School Principal to lead an Educational Solutions campus in Columbus, Ohio—an opportunity to shape not just a school, but the futures of students as they prepare for college, careers, and life beyond graduation. This leader serves... 
    Principal

    Educational Solutions Company

    Columbus, OH
    3 days ago

Do you want to receive more vacancies?

Subscribe and receive similar vacancies to Principal TPM -AI Infrastructure. Be the first to apply!