Sign up to access all features of our service.
  • Job search
  • Favorites
  • Create a CV
    New
  • Salaries
  • Subscriptions

Tech Lead, Deployment & Operations - Custom Infrastructure

$342k

OpenAI

About the Team OpenAI’s Hardware organization develops silicon and system‑level solutions designed for the unique demands of advanced AI workloads. The team is responsible for building the next generation of AI‑native silicon while working closely with software and research partners to co‑design hardware tightly integrated with AI models. In addition to delivering production‑grade silicon for OpenAI’s supercomputing infrastructure, the team also creates custom design tools and methodologies that accelerate innovation and enable hardware optimized specifically for AI. About the Role We are seeking a Technical Lead to lead deployment and operations for OpenAI’s Silicon & Systems team. This person will become the Directly‑Responsible Individual responsible for bringing OpenAI’s custom silicon and associated systems into data center environments, ensuring successful deployment, bring‑up, validation, operational readiness, and ongoing reliability at scale. This role sits at the intersection of silicon, systems, infrastructure, data center operations, and software. You will lead a team focused on taking new hardware platforms from lab validation into production data center deployment. You will be responsible for building the operational processes, technical workflows, tooling, and cross‑functional alignment required to deploy and operate custom AI hardware reliably in OpenAI’s supercomputing infrastructure. The ideal candidate is both a strong leader and a deeply technical operator. You should be comfortable staying close to the technical details of hardware bring‑up, fleet deployment, debugging, system validation, data center integration, and production operations. This role requires strong execution, excellent cross‑functional judgment, and the ability to drive clarity in ambiguous, fast‑moving environments. In this role, you will: Lead a team responsible for deployment and operations of OpenAI’s custom silicon and systems in data center environments Own the path from hardware bring‑up and validation through production deployment, operational readiness, and sustained fleet support Partner closely with silicon, systems, software, infrastructure, networking, data center, supply chain, and external partner teams to ensure successful deployment at scale Define deployment processes, operational playbooks, technical readiness criteria, escalation paths, and reliability practices for new hardware platforms Drive cross‑functional execution across lab bring‑up, rack/system integration, data center deployment, fleet monitoring, debugging, and issue resolution Stay hands‑on technically through architecture reviews, deployment planning, failure analysis, operational debugging, and critical system‑level decision‑making Identify gaps in tooling, observability, automation, validation coverage, and operational processes, and build plans to close them Establish clear metrics for deployment readiness, reliability, performance, maintainability, and operational health Build a strong engineering culture grounded in ownership, technical rigor, operational excellence, and high‑velocity execution Ensure OpenAI’s custom hardware platforms can be deployed and operated reliably, repeatably, and safely at scale Be a contributor and technical driver for the architecture and design of future ML systems You might thrive in this role if you: Enjoy mentoring and developing engineers while staying deeply engaged in technical execution Are excited by the challenge of bringing new custom hardware platforms into real‑world production data center environments Can operate across silicon, systems, software, infrastructure, and data center operations Are comfortable leading through ambiguity, especially when the hardware, tooling, and operational model are still being built Have strong judgment around deployment sequencing, technical risk, operational readiness, and when to elevate issues Communicate clearly across technical and operational teams, and can align stakeholders through complex deployment and production issues Care deeply about building practical systems, tools, and processes that work reliably at scale Have a bias toward ownership and are comfortable jumping into urgent technical issues when needed Qualifications 8+ years of engineering experience in hardware systems, infrastructure, data center deployment, production operations, systems engineering, silicon bring‑up, or related technical domains Strong technical depth in one or more of: hardware deployment, data center operations, rack‑scale systems, silicon bring‑up, systems validation, fleet operations, reliability engineering, infrastructure automation, or hardware/software integration Experience bringing complex hardware systems from development or validation into production environments Experience working closely with silicon, systems, software, infrastructure, networking, or data center teams Experience with deployment planning, operational readiness, incident response, debugging, and root‑cause analysis for production systems Experience building tooling, automation, observability, or operational processes that improve deployment quality and fleet reliability Demonstrated ability to hire, develop, and lead senior technical talent Ability to move fluidly between people leadership, technical strategy, and hands‑on operational problem solving Strong written and verbal communication skills, especially in high‑urgency, cross‑functional technical environments Experience working in fast‑moving environments Compensation Range: $342K - $445K USD About OpenAI OpenAI is an AI research and deployment company dedicated to ensuring that general‑purpose artificial intelligence benefits all of humanity. We push the boundaries of the capabilities of AI systems and seek to safely deploy them to the world through our products. AI is an extremely powerful tool that must be created with safety and human needs at its core, and to achieve our mission, we must encompass and value the many different perspectives, voices, and experiences that form the full spectrum of humanity. We are an equal opportunity employer, and we do not discriminate on the basis of race, religion, color, national origin, sex, sexual orientation, age, veteran status, disability, genetic information, or other applicable legally protected characteristic. Background checks for applicants will be administered in accordance with applicable law, and qualified applicants with arrest or conviction records will be considered for employment consistent with those laws, including the San Francisco Fair Chance Ordinance, the Los Angeles County Fair Chance Ordinance for Employers, and the California Fair Chance Act, for US‑based candidates. For unincorporated Los Angeles County workers: we reasonably believe that criminal history may have a direct, adverse and negative relationship with the following job duties, potentially resulting in the withdrawal of a conditional offer of employment: protect computer hardware entrusted to you from theft, loss or damage; return all computer hardware in your possession (including the data contained therein) upon termination of employment or end of assignment; and maintain the confidentiality of proprietary, confidential, and non‑public information. In addition, job duties require access to secure and protected information technology systems and related data security obligations. We are committed to providing reasonable accommodations to applicants with disabilities, and requests can be made via this link. OpenAI Global Applicant Privacy Policy #J-18808-Ljbffr

Vacancy posted 4 days ago
Similar jobs that could be interesting for youBased on the Tech Lead, Deployment & Operations - Custom Infrastructure in San Francisco, CA vacancy
  •  ...Senior Software Engineer (Tech Lead), Customer Domain Engineering San...  ...best-in-class technology infrastructure to power a global private...  ...Support healthy system operations and ensure high availability...  ...with CI/CD pipelines and deployment processes Experience building... 
    Operations
    Work at office
    Local area
    2 days per week
    3 days per week

    FORGE

    San Francisco, CA
    2 days ago
  •  ...Tech Lead, AI Compute Infrastructure Los Angeles, Palo Alto, San Francisco, Toronto...  ...processing and large-scale deployment of our state-of-art video...  ...techniques (e.g., custom CUDA kernels, distributed...  ...background in building and operating large-scale data infrastructure... 
    Suggested
    Full time

    HeyGen

    San Francisco, CA
    5 days ago
  •  ...with engineers, scientists, operators, and more from Palantir,...  ...Human data is the core infrastructure to AI advancement. Frontier...  ...and our most strategic customers, including leading AI labs. You'll own the end...  ...lifecycle of high-impact deployments—from scoping complex... 
    Operations
    Full time
    Work at office
    Remote work
    Flexible hours
    Shift work

    Handshake

    San Francisco, CA
    4 days ago
  • $216k - $270k

     ...wants to build the intelligence infrastructure that transforms how...  ...and develop their people. Customer Value + Performance Intelligence...  ...planning through development, deployment, and maintenance Build scalable...  ...with modern backend tech stacks (we use Ruby and Elixir... 
    Suggested
    Work experience placement
    Summer holiday
    Live out
    Local area
    Flexible hours

    BetterUp

    San Francisco, CA
    3 days ago
  •  ...Product Infrastructure Engineer Netic is the AI revenue engine for...  ...our Series B, we helped our customers book hundreds of thousands of...  ...America. There are now companies operating entirely AI-first on Netic....  ...as code and automated deployment pipelines for reliable releases... 
    Operations
    Immediate start
    Sleeping nights

    Netic

    San Francisco, CA
    16 days ago
  •  ...drive down the cost and time of deploying sensors by 10x. Our platform...  ...visibility, autonomous operations management, and "digital twinning...  ...Specter is hiring an ML infrastructure engineer to build and scale...  ...and decision-making for our customers' physical assets. Key responsibilities... 
    Operations

    Specter Services LLC

    San Francisco, CA
    5 days ago
  • $180k - $260k

     ...developers the tools and cloud infrastructure to build, scale, and...  ...AI SDK, Vercel helps customers like Ramp, Supreme,...  ...Vercel to build and operate infrastructure that...  ...powers Vercel’s build and deployment lifecycle — from...  ...and developer delight. Lead projects end-to-end: from... 
    Operations
    Work from home
    Flexible hours

    Vercel Corp

    San Francisco, CA
    3 days ago
  • $160k - $220k

    Backed by leading Silicon Valley investors, Peregrine...  ...data into operational intelligence — instantly...  ...supports hundreds of customers across 30+ states and...  ...a Software Engineer, Infrastructure to join our growing team...  ...practices, including deployments, CI/CD, data management... 
    Operations
    Work at office
    Local area

    Peregrine Technologies

    San Francisco, CA
    5 days ago
  • $141.9k - $190.3k

     ...on the Build Tooling Infrastructure team at Disney Entertainment...  ...design and develop custom tools and maintain...  ...build, release, and deployment events for use across...  ...and affiliates, is a leading diversified...  ...corner of the globe. With operations in more than 40 countries... 
    Operations
    Local area

    The Walt Disney Studios

    San Francisco, CA
    2 days ago
  •  ...Core (aka Browserbase Core Infrastructure) is the backbone of everything...  ...you'll do Build, operate, and grow the Browserbase Core...  ...platform to meet rapidly expanding customer adoption and demand....  ...zero-downtime multi-region deployments. A systems-thinking mindset... 
    Operations
    Immediate start
    Relocation

    Browserbase

    San Francisco, CA
    2 days ago
  •  ...important built projects - digital infrastructure, energy, industrial - from...  ...entire product; Build and operate the core primitives: agent...  ...capability, velocity, and customer trust. Own infrastructure end...  ...design, implementation, testing, deployment, observability, and ongoing... 
    Operations
    Live in

    Build Technologies

    San Francisco, CA
    5 days ago
  • $163.71k - $306k

     ...the world runs on custom software for critical operations like tracking performance...  ...in these tools, leading to a lot of old and...  ...phase and we need infrastructure engineers to tackle...  ...and on-premise deployments Work with the team...  ...experience with our tech stack: Node, Postgres... 
    Operations

    Retool

    San Francisco, CA
    5 days ago
  •  ...humanity. The Identity Infrastructure Engineering team sits...  ...our model weights, customer data, and critical systems...  ...providers, and operational infrastructure to build...  ...this role, you will: Lead the architecture,...  ...is an AI research and deployment company dedicated to... 
    Operations
    Work at office
    Relocation package

    Aimling

    San Francisco, CA
    4 days ago
  •  ...humanity. We’re training and deploying frontier models for...  ...the value they drive for our customers. We like to work hard and move...  ...developing, deploying, and operating the AI platform delivering Cohere...  ...experience running production infrastructure at a large scale Experience... 
    Operations
    Full time
    Work experience placement
    Work at office
    Remote work
    Flexible hours

    Jaide Health

    San Francisco, CA
    4 days ago
  • $180k - $237.5k

     ...batteries we already have. Infrastructure Software Engineer, Energy...  ...engineering team to develop and deploy Battery Energy Storage...  ...Evaluate server hardware and operating systems to help define the architecture...  ...from early deployments and customer pilots into concrete,... 
    Operations
    Full time
    Work at office
    Shift work

    Redwood Materials

    San Francisco, CA
    1 day ago
  • $293k - $385k

     ...humanity. The Identity Infrastructure Engineering team sits...  ...our model weights, customer data, and critical systems...  ...providers, and operational infrastructure to build...  ...this role, you will: Lead the architecture,...  ...is an AI research and deployment company dedicated to... 
    Operations
    Full time
    Work at office
    Local area
    Relocation package
    Flexible hours

    Slope

    San Francisco, CA
    4 days ago
  • $10k

     .... Ramp is a financial operations platform designed to save...  ...or executives of leading companies. The Ramp team...  ...versions of our database infrastructure. They have been accountable for deploying production databases and...  ...ability to think through customer requirements and come... 
    Operations
    Full time
    Work at office
    Home office
    Relocation package
    Flexible hours

    Ramp

    San Francisco, CA
    3 days ago
  •  ...next generation of AI Operators-multimodal,...  ...some of the world's leading financial institutions...  ...reviews to handling customer operations, our Operators...  ...Software Engineer, AIOps & Infrastructure at Eloquent AI, you will...  ...to support the deployment and operation of our... 
    Operations

    Eloquent AI

    San Francisco, CA
    3 days ago
  •  ...Judgment Labs builds infrastructure for Agent Behavior Monitoring (ABM...  ...systems are behaving post-deployment. Instead of reactive incident...  ...for enabling enterprise customers to run Judgment in their environments...  ...from architecture through operations. What You’ll Do Design and... 
    Operations

    Judgment Labs

    San Francisco, CA
    5 days ago
  • $347k

     ...in what we build but operational in how we execute, and...  ...Engineer to join the Infrastructure Security (InfraSec)...  ...backbone of OpenAI's customer and supercomputing environment...  ...ergonomics. Lead cross-functional...  ...is an AI research and deployment company dedicated to... 
    Operations

    OpenAI

    San Francisco, CA
    3 days ago
  •  ..., we have enabled our customers to realize the most value...  ...development, support, operations, and MSP projects. Job...  ...for a Senior Cloud Infrastructure Engineer who is strong...  ...role where you’ll help lead the evolution of our...  ...(Core Focus) Design, deploy, and operate AWS infrastructure... 
    Operations
    Shift work

    InterScripts, Inc.

    Daly City, CA
    5 days ago
  • $347k

     ...in what we build but operational in how we execute, and...  ...Engineer to join the Infrastructure Security (InfraSec)...  ...backbone of OpenAI’s customer and supercomputing environment...  ...ergonomics. Lead cross‑functional launches...  ...an AI research and deployment company dedicated to... 
    Operations
    Full time
    Work at office
    Local area
    Remote work
    Relocation package
    Flexible hours

    Centaur Labs

    San Francisco, CA
    4 days ago
  • $190k - $250k

     ...Staff Software Engineer / Tech Lead, ML Infrastructure Heartflow is a medical technology company...  ...technically. This is a full-time position operating on a hybrid schedule out of our San...  ...developing, evaluating, and deploying algorithms on massive medical imaging... 
    Full time
    Work at office
    Local area
    Worldwide
    Relocation

    HeartFlow

    San Francisco, CA
    1 day ago
  • $148.5k - $223.9k

     ...Salesforce’s massive scale voice infrastructure. We are part of the Service...  ...service for Salesforce customers including Voice. In this role...  ...leverage your experience in deploying, maintaining, monitoring large...  ...‑on experience in Voice operations in a multi‑region enterprise... 
    Operations
    Flexible hours

    Centaur Labs

    San Francisco, CA
    3 days ago
  •  ...Senior Software Engineer, Infrastructure & Platform Role...  ...and experimentation. Our customers are the foundation model...  ...engineering organization and lead major technical...  ...standards for system design, deployment, reliability, and infrastructure operations. Required... 
    Operations

    AfterQuery

    San Francisco, CA
    4 days ago
  •  ...team focused on new customer acquisition, and Connor...  ...AI. As our Staff SRE Tech Lead, you'll own the...  ...challenges. Scale our data infrastructure: Architect and extend...  ...and PostgreSQL deployments to handle terabytes of...  ...deployments, scaling operations, backup verification,... 
    Operations

    Unify

    San Francisco, CA
    4 days ago
  •  ...Role We want a Platform/Infrastructure engineer to help shape how Promise...  ...enable delivery for our customers: faster, smarter, safer,...  ...with our product and forward-deployed engineers to help ensure they...  ...security, environment setup, operations, policy checks, on-call triage... 
    Operations
    Permanent employment
    Work at office
    Local area
    Flexible hours

    Promise Co.

    San Francisco, CA
    2 days ago
  • $340 per month

     ...dedicated to helping under-served customers in emerging markets to...  ...StaffEngineer for Cloud Infrastructure and Developer Productivity , you will lead the development and operations of a robust AWS and...  ...Integration and Continuous Deployment (CI/CD) and supporting developers... 
    Operations
    Immediate start
    Home office
    Flexible hours

    PayJoy

    San Francisco, CA
    1 day ago
  • You’ll own cloud and infrastructure security at a company where tenant...  ...enterprise requirement. Mercor's customers - including frontier AI labs...  ...harden Kubernetes clusters, deploy cloud security posture...  ...Kubernetes hardening to CSPM operations. See the future early. Working... 
    Operations
    Remote work

    Mercor, Inc.

    San Francisco, CA
    2 days ago
  •  ...The Role The Cloud Infrastructure team owns the...  ...cloud primitives and deployment models that power Perplexity...  ...solutions for enterprise customers. As Perplexity...  ...this team builds and operates the security, isolation...  ..., including leading the design of complex... 
    Operations

    Perplexity AI

    San Francisco, CA
    4 days ago

Do you want to receive more vacancies?

Subscribe and receive similar vacancies to Tech Lead, Deployment & Operations - Custom Infrastructure. Be the first to apply!