Sign up to access all features of our service.
  • Job search
  • Favorites
  • Create a CV
    New
  • Salaries
  • Subscriptions

Tech Lead, AI Compute Infrastructure

HeyGen

Tech Lead, AI Compute Infrastructure

Los Angeles, Palo Alto, San Francisco, Toronto, Singapore

About HeyGen

At HeyGen, our mission is to make visual storytelling accessible to all. Over the last decade, visual content has become the preferred method of information creation, consumption, and retention. But the ability to create such content, in particular videos, continues to be costly and challenging to scale. Our ambition is to build technology that equips more people with the power to reach, captivate, and inspire audiences.

We are seeking a seasoned Technical Leader to build and scale the foundational compute infrastructure that powers our state-of-the-art AI models—from multimodal training data pipelines to high-throughput, low-latency video generation.

Responsibilities

You will be the core engineer responsible for building the robust, efficient, and scalable platform that enables our research and production teams to rapidly iterate on HeyGen's generative video models. Your contributions will directly impact model performance, developer productivity, and the final quality of every AI-generated video.

  • Optimize GPU Utilization: Design and implement mechanisms to aggressively optimize GPU and cluster utilization across thousands of devices for inference, training, data processing and large-scale deployment of our state-of-art video generation models.

  • Develop Large-Scale AI Job Framework: Build highly scalable, reliable frameworks for launching and managing massive, heterogeneous compute jobs, including multi-modal high-volume data ingestion/processing, distributed model training, and continuous evaluation/benchmarking.

  • Enhance Observability: Develop world-class observability, tracing, and visualization tools for our compute cluster to ensure reliability, diagnose performance bottlenecks (e.g., memory, bandwidth, communication).

  • Accelerate Pipelines: Collaborate closely with AI researchers and AI engineers to integrate innovative acceleration techniques (e.g., custom CUDA kernels, distributed training libraries) into production-ready, scalable training and inference pipelines.

  • Infrastructure Management: Champion the adoption and optimization of modern cloud and container technologies (Kubernetes, Ray) for elastic, cost-efficient scaling of our distributed systems.

Minimum Requirements
  • Bachelor's degree in Computer Science, Engineering, or a related field, or equivalent practical experience.

  • 5+ years of full-time industry experience in large-scale MLOps, AI infrastructure, or HPC systems.

  • Experience with data frameworks and standards like Ray, Apache Spark, LanceDB

  • Strong proficiency in Python and a high-performance language such as C++ for developing core infrastructure components.

  • Deep understanding and hands-on experience with modern orchestration and distributed computing frameworks such as Kubernetes and Ray.

  • Experience with core ML frameworks such as PyTorch, TensorFlow, or JAX.

Preferred Qualifications
  • Master's or PhD in Computer Science or a related technical field.

  • Demonstrated Tech Lead experience, driving projects from conceptual design through to production deployment across cross-functional teams.

  • Prior experience building infrastructure specifically for Generative AI models (e.g., diffusion models, GANs, or large language models) where cost and latency are critical.

  • Proven background in building and operating large-scale data infrastructure (e.g., Ray, Apache Spark) to manage petabytes of multi-modal data (video, audio, text).

  • Expertise in GPU acceleration and deep familiarity with low-level compute programming, including CUDA, NCCL, or similar technologies for efficient inter-GPU communication.

What HeyGen Offers
  • Competitive salary and benefits package.
  • Dynamic and inclusive work environment.
  • Opportunities for professional growth and advancement.
  • Collaborative culture that values innovation and creativity.
  • Access to the latest technologies and tools.

HeyGen is an Equal Opportunity Employer. We celebrate diversity and are committed to creating an inclusive environment for all employees.

Vacancy posted 5 days ago
Similar jobs that could be interesting for youBased on the Tech Lead, AI Compute Infrastructure in San Francisco, CA vacancy
  • $230k - $405k

     ...About the Team Compute Infrastructure builds the platform that turns enormous amounts of compute into a reliable engine for frontier AI. We design, provision, schedule, operate, and optimize the systems that connect accelerators, CPUs, networks, storage, data centers,... 
    Suggested

    Centaur Labs

    San Francisco, CA
    4 days ago
  • $164.2k - $205.2k

     ...Position Overview At Databricks, the Compute Infrastructure organization builds and operates the foundation that runs all Data, AI, and stateful workloads across all major clouds...  ...engineering excellence and platform mindset. Lead cross‑team initiatives that span product... 
    Suggested
    Local area

    I did my part and supported the Regular Toilet

    San Francisco, CA
    4 days ago
  • $342k

     ...the unique demands of advanced AI workloads. The team is...  ...for OpenAI's supercomputing infrastructure, the team also creates custom...  ...We are seeking a Technical Lead to lead deployment and operations...  ...offer of employment: protect computer hardware entrusted to you from... 
    Suggested

    OpenAI

    San Francisco, CA
    1 day ago
  • $190k - $250k

     ...Staff Software Engineer / Tech Lead, ML Infrastructure Heartflow is a medical technology company advancing...  ...technology. The flagship product—an AI-driven, non-invasive cardiac test...  ...and maintain large-scale distributed computing platforms for ML training and evaluation... 
    Suggested
    Full time
    Work at office
    Local area
    Worldwide
    Relocation

    HeartFlow

    San Francisco, CA
    1 day ago
  • $164.2k - $205.2k

    Senior Software Engineer, Compute Infrastructure RDQ427R175 Overview At Databricks, we are passionate...  ...and running the world's best data and AI infrastructure platform so our...  ...engineering excellence and platform mindset. Lead cross‑team initiatives that span product... 
    Suggested
    Local area

    Databricks Inc.

    San Francisco, CA
    3 days ago
  •  ...Omnifold is seeking an Infrastructure Tech Lead / Principal Engineer in San Francisco to own model deployment, security, and cloud resource management. This role demands expertise in cloud computing and a strong Computer Science background. You will enhance system monitoring... 

    Omnifold

    San Francisco, CA
    5 days ago
  • Infrastructure Tech Lead / Principal Engineer Omnifold trains custom AI models that help planners forecast the future. We are hiring our first infrastructure tech lead...  ...What we’re looking for Experience with cloud computing (especially GPU workloads), CI/CD infrastructure... 

    Omnifold

    San Francisco, CA
    3 days ago
  • About Lightfield Lightfield is an AI-native CRM that assembles itself from your email, calendar, and meetings. It captures...  ...Salesforce. About the job Lightfield is seeking a hands‑on Infrastructure Tech Lead to help scale the platform through a period of rapid growth... 
    Immediate start
    Work from home

    Lightfield

    San Francisco, CA
    2 days ago
  • $160k - $210k

     ...Inc. in San Francisco is searching for an experienced Tech Lead to oversee the Core Infrastructure team. This role will involve managing Zip’s Kubernetes...  ...$160,000 - $210,000 and opportunities to develop within a cutting-edge AI platform. #J-18808-Ljbffr ZipHQ, Inc.

    ZipHQ, Inc.

    San Francisco, CA
    2 days ago
  •  ...Cartesia Our mission is to architect AI that learns from and interacts with the...  ...and experiences. We're funded by leading investors at Index Ventures and Lightspeed...  ...models, and we are looking for a TLM, Data Infrastructure to own the strategy and execution for... 
    Work at office
    Visa sponsorship
    Flexible hours

    Cartesia, Inc.

    San Francisco, CA
    3 days ago
  • $216k - $324k

     ...Senior Lead Software Engineer - Developer Infrastructure At Klaviyo, we value the unique backgrounds...  ...Quality Tools & Testing and AI Enablement to ensure a...  ...development, strong knowledge of computer science fundamentals,...  ..., Airflow, and other tech from the big data stack Typescript... 

    Klaviyo

    San Francisco, CA
    4 days ago
  • $342k

    OpenAI is looking for a CPU & Storage Technical Lead to define and drive the architecture strategy for its Stargate infrastructure. This role entails owning technical direction...  ..., and leading integration into large-scale AI clusters. The ideal candidate will have a Bachelor... 

    OpenAI

    San Francisco, CA
    3 days ago
  •  ...mission is to build the next generation of AI: ubiquitous, interactive intelligence...  ...models and experiences. We’re funded by leading investors at Index Ventures and...  ...models, and we are looking for a TLM, Data Infrastructure to own the strategy and execution for all... 
    Work at office

    Cartesia

    San Francisco, CA
    2 days ago
  •  ...A tech-driven AI company in San Francisco is seeking a TLM, Data Infrastructure to lead the strategy for managing datasets crucial for their groundbreaking models. The role involves managing a team of data engineers, designing scalable data pipelines for various data types... 

    Cartesia, Inc.

    San Francisco, CA
    4 days ago
  • $255k - $320k

     ...Communications Lead, Infrastructure and Engineering New York City, NY; San Francisco, CA; Seattle...  ...reliable, interpretable, and steerable AI systems. We want AI to be safe and...  ...biology as with traditional efforts in computer science. We're an extremely collaborative... 
    Work at office
    Visa sponsorship
    Flexible hours

    Anthropic

    San Francisco, CA
    4 days ago
  • $255k - $405k

     ...About the Team The Agent Infrastructure team at OpenAI is responsible...  ...deployment of highly useful AI agents, both internally and for...  ...capabilities to some of the largest compute clusters in the world. At the...  ...with infrastructure-as-code tech like Terraform. Are driven... 
    Work at office
    Worldwide
    Relocation package

    OpenAI

    San Francisco, CA
    3 days ago
  •  ...OpenAI seeks a Senior Manager, Technical Accounting to lead the evaluation and documentation of accounting related to the organization’s compute infrastructure in San Francisco. The role includes analyzing complex transactions under U.S. GAAP and implementing processes... 
    Relocation package

    OpenAI

    San Francisco, CA
    4 days ago
  •  ...About the Team Storage Infrastructure provides APIs for data access, placement, and lifecycle...  ...throughput, and IOPs satisfy the needs of our AI researchers. Scalability, reliability,...  ...offer of employment: protect computer hardware entrusted to you from theft, loss... 

    OpenAI

    San Francisco, CA
    5 days ago
  • $150k - $170k

     ...prioritizes research in areas poised for impact including AI and advanced computing, astrophysics, biosciences, climate, and space—as well as...  ...has a unique advantage. By supporting enabling infrastructure, foundational research, and targeted programs in science... 
    Local area

    Schmidt Entities

    San Francisco, CA
    2 days ago
  • $180k - $280k

     ...Vercel gives developers the tools and cloud infrastructure to build, scale, and secure a faster,...  .... As the team behind v0, Next.js, and AI SDK, Vercel helps customers like Ramp, Supreme...  ...building a platform that powers all of compute at Vercel. That means we provide all the... 
    Work at office
    Remote work
    Work from home
    Monday to Friday
    Flexible hours

    Nerdleveltech

    San Francisco, CA
    5 days ago
  •  ...important — than ever, with AI enabling fraudsters to launch...  ...able to serve a wide range of leading companies. For example, Reddit...  ...join us! About the role The Compute team's mission: any engineer,...  ...production scale with no meaningful infrastructure knowledge required. We build... 
    Full time
    For contractors
    Internship

    Persona

    San Francisco, CA
    4 days ago
  • $230k - $385k

    About the Team The Storage Infrastructure team builds and operates the storage foundation behind...  ...About OpenAI OpenAI is an AI research and deployment company dedicated...  ...conditional offer of employment: protect computer hardware entrusted to you from theft, loss... 

    OpenAI

    San Francisco, CA
    3 days ago
  •  ...Overview We\'re looking for a Staff Software Engineer – Computer Vision Deployment to build and scale the infrastructure that powers our AI-driven warehouse intelligence platform. You\'ll own the end-to-end lifecycle of computer vision models — from training pipelines... 
    Work at office
    3 days per week

    Claryo

    San Francisco, CA
    5 days ago
  •  ...About Eventual Every breakthrough Physical AI system — humanoid robots, autonomous...  ...with the top PhysicalAI labs and public AI infrastructure companies today. We have raised $30M...  ...horizon — on billions of dollars worth of compute, in collaboration with partners that are... 
    Work at office
    Flexible hours
    Night shift

    Eventual

    San Francisco, CA
    4 days ago
  • $150k - $250k

     ...are looking for a Software Engineer with a focus on Onboard Infrastructure and Drivers to join us and take a key role in designing and...  ...across multiple domains to bring together our embedded devices, AI, computing hardware, and sensors to create a highly reliable and... 
    Temporary work
    Work at office
    Visa sponsorship
    Flexible hours

    Omaze

    San Francisco, CA
    5 days ago
  • $166k - $225k

     ...We do this by building and running the world's best data and AI infrastructure platform so our customers can use deep data insights to...  ...and release packaging. What we look for: BS (or higher) in Computer Science, or a related field 5+ years of experience writing production... 
    For contractors
    Local area
    Worldwide
    Flexible hours

    Databricks

    San Francisco, CA
    4 days ago
  •  ...What you’ll do The Payments Infrastructure team builds the trust boundary between a live conversation...  ...a problem once per customer. Degree in Computer Science or a related field, or...  ...working to bring the transformative power of AI to every organization in the world. To do... 
    Full time
    Flexible hours

    Sierra

    San Francisco, CA
    4 days ago
  • $230k

     ...The Fleet team at OpenAI supports the computing environment that powers our cutting-edge...  ...prioritize safety, reliability, and responsible AI deployment over unchecked growth....  ...health and efficiency of our supercomputing infrastructure. Our team empowers strong engineers... 

    OpenAI

    San Francisco, CA
    3 days ago
  •  ...Applied AI Lab Job Compensation: Competitive base salary...  ...and security for multi-tenant compute. What You'll Do Design...  ..., multi-tenant container infrastructure with fast startup and smart autoscaling...  ..., logs) with clear SLOs; lead incident response.... 
    Remote work

    Julius

    San Francisco, CA
    4 days ago
  •  ...are hiring Software Engineers focused on AI Infrastructure to build the systems that enable...  ...usability. Qualifications Degree in Computer Science, Engineering, or comparable combination...  ...and brand at the forefront of fashion-tech innovation. Your design work will... 
    Internship
    Immediate start

    SpreeAI

    San Francisco, CA
    5 days ago

Do you want to receive more vacancies?

Subscribe and receive similar vacancies to Tech Lead, AI Compute Infrastructure. Be the first to apply!