Sign up to access all features of our service.
  • Job search
  • Favorites
  • Create a CV
    New
  • Salaries
  • Subscriptions

Tech Lead, AI Compute Infrastructure

HeyGen

Tech Lead, AI Compute Infrastructure

Los Angeles, Palo Alto, San Francisco, Toronto, Singapore

About HeyGen

At HeyGen, our mission is to make visual storytelling accessible to all. Over the last decade, visual content has become the preferred method of information creation, consumption, and retention. But the ability to create such content, in particular videos, continues to be costly and challenging to scale. Our ambition is to build technology that equips more people with the power to reach, captivate, and inspire audiences.

We are seeking a seasoned Technical Leader to build and scale the foundational compute infrastructure that powers our state-of-the-art AI models—from multimodal training data pipelines to high-throughput, low-latency video generation.

Responsibilities

You will be the core engineer responsible for building the robust, efficient, and scalable platform that enables our research and production teams to rapidly iterate on HeyGen's generative video models. Your contributions will directly impact model performance, developer productivity, and the final quality of every AI-generated video.

  • Optimize GPU Utilization: Design and implement mechanisms to aggressively optimize GPU and cluster utilization across thousands of devices for inference, training, data processing and large-scale deployment of our state-of-art video generation models.

  • Develop Large-Scale AI Job Framework: Build highly scalable, reliable frameworks for launching and managing massive, heterogeneous compute jobs, including multi-modal high-volume data ingestion/processing, distributed model training, and continuous evaluation/benchmarking.

  • Enhance Observability: Develop world-class observability, tracing, and visualization tools for our compute cluster to ensure reliability, diagnose performance bottlenecks (e.g., memory, bandwidth, communication).

  • Accelerate Pipelines: Collaborate closely with AI researchers and AI engineers to integrate innovative acceleration techniques (e.g., custom CUDA kernels, distributed training libraries) into production-ready, scalable training and inference pipelines.

  • Infrastructure Management: Champion the adoption and optimization of modern cloud and container technologies (Kubernetes, Ray) for elastic, cost-efficient scaling of our distributed systems.

Minimum Requirements
  • Bachelor's degree in Computer Science, Engineering, or a related field, or equivalent practical experience.

  • 5+ years of full-time industry experience in large-scale MLOps, AI infrastructure, or HPC systems.

  • Experience with data frameworks and standards like Ray, Apache Spark, LanceDB

  • Strong proficiency in Python and a high-performance language such as C++ for developing core infrastructure components.

  • Deep understanding and hands-on experience with modern orchestration and distributed computing frameworks such as Kubernetes and Ray.

  • Experience with core ML frameworks such as PyTorch, TensorFlow, or JAX.

Preferred Qualifications
  • Master's or PhD in Computer Science or a related technical field.

  • Demonstrated Tech Lead experience, driving projects from conceptual design through to production deployment across cross-functional teams.

  • Prior experience building infrastructure specifically for Generative AI models (e.g., diffusion models, GANs, or large language models) where cost and latency are critical.

  • Proven background in building and operating large-scale data infrastructure (e.g., Ray, Apache Spark) to manage petabytes of multi-modal data (video, audio, text).

  • Expertise in GPU acceleration and deep familiarity with low-level compute programming, including CUDA, NCCL, or similar technologies for efficient inter-GPU communication.

What HeyGen Offers
  • Competitive salary and benefits package.
  • Dynamic and inclusive work environment.
  • Opportunities for professional growth and advancement.
  • Collaborative culture that values innovation and creativity.
  • Access to the latest technologies and tools.

HeyGen is an Equal Opportunity Employer. We celebrate diversity and are committed to creating an inclusive environment for all employees.

Vacancy posted 1 day ago
Similar jobs that could be interesting for youBased on the Tech Lead, AI Compute Infrastructure in San Francisco, CA vacancy
  • $342k

     ...the unique demands of advanced AI workloads. The team is...  ...for OpenAI's supercomputing infrastructure, the team also creates custom...  ...We are seeking a Technical Lead to lead deployment and operations...  ...offer of employment: protect computer hardware entrusted to you from... 
    Suggested

    OpenAI

    San Francisco, CA
    8 hours ago
  • $164.2k - $205.2k

    Senior Software Engineer, Compute Infrastructure RDQ427R175 Overview At Databricks, we are passionate...  ...and running the world's best data and AI infrastructure platform so our...  ...engineering excellence and platform mindset. Lead cross‑team initiatives that span product... 
    Suggested
    Local area

    Databricks Inc.

    San Francisco, CA
    2 days ago
  • About the Team We build and scale the Compute foundation that powers frontier AI research and products. Our team delivers reliable, efficient, and cost...  ...GPU fleets in the world, rapidly bringing new infrastructure online across a wide range of providers, hardware types... 
    Suggested

    AI Chopping Block, Inc.

    San Francisco, CA
    4 days ago
  • AI Chopping Block, Inc. is seeking engineers to build and operate the next generation of compute infrastructure. You will handle large-scale clusters and high-performance networks while solving real-time operational challenges. Ideal candidates have experience in distributed... 
    Suggested

    AI Chopping Block, Inc.

    San Francisco, CA
    4 days ago
  •  ...eager to shape the future of AI and ML at Whatnot. You'll lead the development and scaling of the core infrastructure that powers machine...  ...people using it. As our next Tech Lead Manager, ML Platform you...  ...~ Bachelor's degree in Computer Science, Statistics, Applied... 
    Suggested
    Work experience placement
    Work at office
    Local area
    Remote work
    Work from home
    Home office

    Whatnot

    San Francisco, CA
    3 days ago
  • About Lightfield Lightfield is an AI-native CRM that assembles itself from your email, calendar, and meetings. It captures...  ...Salesforce. About the job Lightfield is seeking a hands‑on Infrastructure Tech Lead to help scale the platform through a period of rapid growth... 
    Immediate start
    Work from home

    Lightfield

    San Francisco, CA
    1 day ago
  • $342k

    OpenAI is looking for a CPU & Storage Technical Lead to define and drive the architecture strategy for its Stargate infrastructure. This role entails owning technical direction...  ..., and leading integration into large-scale AI clusters. The ideal candidate will have a Bachelor... 

    OpenAI

    San Francisco, CA
    2 days ago
  •  ...mission is to build the next generation of AI: ubiquitous, interactive intelligence...  ...models and experiences. We’re funded by leading investors at Index Ventures and...  ...models, and we are looking for a TLM, Data Infrastructure to own the strategy and execution for all... 
    Work at office

    AI Chopping Block, Inc.

    San Francisco, CA
    1 day ago
  • $216k - $324k

    Senior Lead Software Engineer - Developer Infrastructure At Klaviyo, we value the unique backgrounds...  ...Quality Tools & Testing and AI Enablement to ensure a...  ...development, strong knowledge of computer science fundamentals,...  ..., Airflow, and other tech from the big data stack Typescript... 

    Klaviyo Inc.

    San Francisco, CA
    4 days ago
  • $150k - $170k

     ...prioritizes research in areas poised for impact including AI and advanced computing, astrophysics, biosciences, climate, and space—as well as...  ...has a unique advantage. By supporting enabling infrastructure, foundational research, and targeted programs in science... 
    Local area

    Schmidt Entities

    San Francisco, CA
    15 hours ago
  • $230k - $385k

    About the Team The Storage Infrastructure team builds and operates the storage foundation behind...  ...About OpenAI OpenAI is an AI research and deployment company dedicated...  ...conditional offer of employment: protect computer hardware entrusted to you from theft, loss... 

    OpenAI

    San Francisco, CA
    8 hours ago
  • A tech-driven AI company in San Francisco is seeking a TLM, Data Infrastructure to lead the strategy for managing datasets crucial for their groundbreaking models. The role involves managing a team of data engineers, designing scalable data pipelines for various data types... 

    Cartesia

    San Francisco, CA
    1 day ago
  • A leading technology firm in San Francisco is seeking a TLM, Data Infrastructure to lead data strategy and execution. The successful candidate will manage a team, design data...  ..., and ensure data quality for innovative AI research. Candidates should have technical expertise... 

    AI Chopping Block, Inc.

    San Francisco, CA
    1 day ago
  •  ...to democratize access to cutting‑edge AI infrastructure previously reserved for hyperscalers. What...  ...into a global platform, connecting leading AI labs, data centers, and cloud providers...  ...establish a global marketplace for AI compute—powering AGI with the same fluidity as... 
    Full time
    Remote work

    Andromeda

    San Francisco, CA
    4 days ago
  • $148.1k - $250k

     ...how work gets done. Airtable’s infrastructure is evolving to meet the needs...  ...usage, and vertical scaling. Compute: The compute pod builds and...  ...Airtable, including all new AI services such as vector databases...  ...do Proactively identify and lead significant improvements to... 
    For contractors
    Work at office
    Remote work
    Relocation
    Flexible hours

    GrabJobs

    San Francisco, CA
    4 days ago
  •  ...Applied AI Lab Job Compensation: Competitive base salary...  ...and security for multi-tenant compute. What You'll Do Design...  ..., multi-tenant container infrastructure with fast startup and smart autoscaling...  ..., logs) with clear SLOs; lead incident response.... 
    Remote work

    Julius

    San Francisco, CA
    3 days ago
  •  ...Exa Infrastructure Engineer Exa is building a search engine from scratch to serve every AI agent. We build massive-scale infrastructure to crawl the web, train state-of-the...  ...databases in rust to search over it. If you like compute, we also own a $5M H200 GPU cluster (and... 
    H1b

    Exa Labs

    San Francisco, CA
    1 day ago
  • $230k

     ...The Fleet team at OpenAI supports the computing environment that powers our cutting-edge...  ...prioritize safety, reliability, and responsible AI deployment over unchecked growth....  ...health and efficiency of our supercomputing infrastructure. Our team empowers strong engineers... 

    OpenAI

    San Francisco, CA
    1 day ago
  •  ...are hiring Software Engineers focused on AI Infrastructure to build the systems that enable...  ...usability. Qualifications Degree in Computer Science, Engineering, or comparable combination...  ...and brand at the forefront of fashion-tech innovation. Your design work will... 
    Internship
    Immediate start

    SpreeAI

    San Francisco, CA
    4 days ago
  • $190k - $250k

     ...This is a job that Jill, our AI Recruiter, is recruiting for on behalf of...  ...Job Title: Software Engineer (Infrastructure) Salary: $190K – $250K + Equity...  ...rapidly expanding global usage and compute-heavy AI workloads. Lead technical deployments for large enterprise... 

    Jack and Jill AI

    San Francisco, CA
    1 day ago
  •  ...About Us At Hayden AI, we are on a mission to harness the power of computer vision to transform the way transit systems...  ...real-world challenges. The Infrastructure Engineering team is crucial to...  ...Architect the Service Backbone: Lead the design and evolution of the... 
    Shift work

    Hayden AI

    San Francisco, CA
    8 hours ago
  • $209k - $240k

     ...We Are Notion is the collaborative AI workspace where teams and agents think together...  ...think and execute. About the Product Infrastructure Team The Product Infrastructure...  ...Zanzibar by Google). You've heard of computing pioneers like Ada Lovelace, Douglas... 
    Local area

    Notion Labs, Inc

    San Francisco, CA
    1 day ago
  •  ...Who We Are Serval is an AI-native automation platform transforming...  ...moving. We're backed by leading investors including Sequoia...  .... As a Software Engineer, Infrastructure, you'll build and scale the...  ...performance, including compute, storage, networking, and database... 

    Serval

    San Francisco, CA
    4 days ago
  • $255k - $405k

     ...-to-week. Supporting that pace requires infrastructure that can handle real production constraints...  ...About OpenAI OpenAI is an AI research and deployment company dedicated...  ...conditional offer of employment: protect computer hardware entrusted to you from theft, loss... 
    Contract work
    Shift work

    OpenAI

    San Francisco, CA
    1 day ago
  • $160k - $220k

     ...Senior Software Engineer - Infrastructure As a Senior Software Engineer...  ...development workflow including AI assistant tools, language...  ...production, and mission contexts Lead initiatives that improve...  ...in a related discipline (e.g. Computer Science, Information Technology... 
    Permanent employment
    Remote work
    Flexible hours

    Astranis

    San Francisco, CA
    6 days ago
  •  ...Software Engineer Runloop.ai is building the foundational infrastructure for the next generation of AI development. We provide AI engineers and data...  ...technologies. Qualifications ~ Bachelor's degree in Computer Science or a related field, or equivalent experience.... 
    Work at office
    Work from home
    1 day per week

    Runloop AI

    San Francisco, CA
    1 day ago
  • $150k - $250k

     ...Foundry Robotics is building an AI-native robotics manufacturing...  ...production capability for leading robotics companies and national...  ...the backend systems and infrastructure that power the factory of the...  ...infrastructure across multiple compute environments You will build... 
    Full time
    Contract work

    Foundry Robotics Inc

    San Francisco, CA
    3 days ago
  •  ...is a research lab working on AI to unlock biology. Our models...  ...obsessed with building systems and infrastructure that are as simple as...  ...resilient. You will build the compute and infrastructure systems underpin...  ...that might prevent leading biopharma organizations from... 
    Flexible hours

    Chai Discovery, Inc

    San Francisco, CA
    2 days ago
  •  ...Terraform, Cloud (like AWS) • Degree in computer science (or similar field), and ideally...  ...Experience / Misc: • Experience working in Infrastructure / DevOps at one or more of the following...  ...growth blue chip startups like Scale AI, Coinbase, Sigma, Linear, etc. o Self-hosted... 
    Work experience placement

    Tranzeal

    San Francisco, CA
    8 hours ago
  • $140k - $260k

     ...Infrastructure Engineer Profound is on a mission to help companies understand and control their AI presence. As an Infrastructure Engineer, you will build and scale the systems that...  ...able to handle explosive traffic and compute demands. You will work closely with engineers... 
    Work at office
    Visa sponsorship

    Profound

    San Francisco, CA
    1 day ago

Do you want to receive more vacancies?

Subscribe and receive similar vacancies to Tech Lead, AI Compute Infrastructure. Be the first to apply!