Tech Lead, AI Compute Infrastructure

HeyGen

Tech Lead, AI Compute Infrastructure

Los Angeles, Palo Alto, San Francisco, Toronto, Singapore

About HeyGen

At HeyGen, our mission is to make visual storytelling accessible to all. Over the last decade, visual content has become the preferred method of information creation, consumption, and retention. But the ability to create such content, in particular videos, continues to be costly and challenging to scale. Our ambition is to build technology that equips more people with the power to reach, captivate, and inspire audiences.

We are seeking a seasoned Technical Leader to build and scale the foundational compute infrastructure that powers our state-of-the-art AI models—from multimodal training data pipelines to high-throughput, low-latency video generation.

Responsibilities

You will be the core engineer responsible for building the robust, efficient, and scalable platform that enables our research and production teams to rapidly iterate on HeyGen's generative video models. Your contributions will directly impact model performance, developer productivity, and the final quality of every AI-generated video.

Optimize GPU Utilization: Design and implement mechanisms to aggressively optimize GPU and cluster utilization across thousands of devices for inference, training, data processing and large-scale deployment of our state-of-art video generation models.
Develop Large-Scale AI Job Framework: Build highly scalable, reliable frameworks for launching and managing massive, heterogeneous compute jobs, including multi-modal high-volume data ingestion/processing, distributed model training, and continuous evaluation/benchmarking.
Enhance Observability: Develop world-class observability, tracing, and visualization tools for our compute cluster to ensure reliability, diagnose performance bottlenecks (e.g., memory, bandwidth, communication).
Accelerate Pipelines: Collaborate closely with AI researchers and AI engineers to integrate innovative acceleration techniques (e.g., custom CUDA kernels, distributed training libraries) into production-ready, scalable training and inference pipelines.
Infrastructure Management: Champion the adoption and optimization of modern cloud and container technologies (Kubernetes, Ray) for elastic, cost-efficient scaling of our distributed systems.

Minimum Requirements

Bachelor's degree in Computer Science, Engineering, or a related field, or equivalent practical experience.
5+ years of full-time industry experience in large-scale MLOps, AI infrastructure, or HPC systems.
Experience with data frameworks and standards like Ray, Apache Spark, LanceDB
Strong proficiency in Python and a high-performance language such as C++ for developing core infrastructure components.
Deep understanding and hands-on experience with modern orchestration and distributed computing frameworks such as Kubernetes and Ray.
Experience with core ML frameworks such as PyTorch, TensorFlow, or JAX.

Preferred Qualifications

Master's or PhD in Computer Science or a related technical field.
Demonstrated Tech Lead experience, driving projects from conceptual design through to production deployment across cross-functional teams.
Prior experience building infrastructure specifically for Generative AI models (e.g., diffusion models, GANs, or large language models) where cost and latency are critical.
Proven background in building and operating large-scale data infrastructure (e.g., Ray, Apache Spark) to manage petabytes of multi-modal data (video, audio, text).
Expertise in GPU acceleration and deep familiarity with low-level compute programming, including CUDA, NCCL, or similar technologies for efficient inter-GPU communication.

What HeyGen Offers

Competitive salary and benefits package.
Dynamic and inclusive work environment.
Opportunities for professional growth and advancement.
Collaborative culture that values innovation and creativity.
Access to the latest technologies and tools.

HeyGen is an Equal Opportunity Employer. We celebrate diversity and are committed to creating an inclusive environment for all employees.

Apply

Vacancy posted 2 days ago

Similar jobs that could be interesting for youBased on the Tech Lead, AI Compute Infrastructure in San Francisco, CA vacancy

Tech Lead, Deployment & Operations - Custom Infrastructure
$342k
...the unique demands of advanced AI workloads. The team is... ...for OpenAI’s supercomputing infrastructure, the team also creates custom... ...Role We are seeking a Technical Lead to lead deployment and operations... ...offer of employment: protect computer hardware entrusted to you...
Suggested
OpenAI
San Francisco, CA
4 days ago
Software Engineer, Compute Infrastructure
$230k - $405k
About the Team Compute Infrastructure builds the platform that turns enormous amounts of compute into a reliable engine for frontier AI. We design, provision, schedule, operate, and optimize the systems that connect accelerators, CPUs, networks, storage, data centers,...
Suggested
Centaur Labs
San Francisco, CA
3 days ago
Senior Software Engineer, Compute Infrastructure
$164.2k - $205.2k
RDQ427R175 At Databricks, the Compute Infrastructure organization builds and operates the foundation that runs all Data, AI and stateful workloads across all major clouds. Our platform... ...excellence and platform mindset Lead cross‑team initiatives that span product and...
Suggested
Local area
Menlo Ventures
San Francisco, CA
1 day ago
Infrastructure Tech Lead — AI/ML Ops & Cloud Security
Omnifold is seeking an Infrastructure Tech Lead / Principal Engineer based in San Francisco. You'll own the systems critical for AI model training and deployment while ensuring security... ...has significant experience in cloud computing, specifically with GPU workloads, and...
Suggested
Omnifold
San Francisco, CA
21 hours ago
Infrastructure Tech Lead
Infrastructure Tech Lead / Principal Engineer Omnifold trains custom AI models that help planners forecast the future. We are hiring our first infrastructure tech lead... ...What we’re looking for Experience with cloud computing (especially GPU workloads), CI/CD infrastructure...
Suggested
Omnifold
San Francisco, CA
3 days ago
Tech Lead, Core Infrastructure & AI Platform
$160k - $210k
...Inc. in San Francisco is searching for an experienced Tech Lead to oversee the Core Infrastructure team. This role will involve managing Zip’s Kubernetes... ...$160,000 - $210,000 and opportunities to develop within a cutting-edge AI platform. #J-18808-Ljbffr ZipHQ, Inc.
ZipHQ, Inc.
San Francisco, CA
2 days ago
Tech Lead, Infrastructure
About Lightfield Lightfield is an AI-native CRM that assembles itself from your email, calendar, and meetings. It captures... ...Salesforce. About the job Lightfield is seeking a hands‑on Infrastructure Tech Lead to help scale the platform through a period of rapid growth...
Immediate start
Work from home
SwiftCruit
San Francisco, CA
2 days ago
Tech Lead - Applied Computer Vision & 3D Reconstruction
Niantic Spatial in San Francisco is seeking a Tech Lead for the Applied Computer Vision Algorithms Team. This role focuses on driving the development of high-fidelity visual and semantic maps for AI and robotics. Applicants should have over 8 years of experience in Computer...
Dormont Manufacturing Company
San Francisco, CA
21 hours ago
Senior Lead Software Engineer - Developer Infrastructure
$216k - $324k
...Klaviyo Klaviyo is a leading marketing automation... ...Team The Developer Infrastructure team at Klaviyo is at... ...Quality Tools & Testing and AI Enablement to ensure a... ..., strong knowledge of computer science fundamentals,... ...data replication. Our Tech Stack Python,...
Klaviyo
San Francisco, CA
1 day ago
Senior Lead Software Engineer - Windows Server Infrastructure
...and meaningful impact. As a Senior Lead Software Engineer - Windows Server Engineering... ...Chase within the Corporate Sector Compute Infrastructure Platform (CIP) organization, you... ...Drives adoption and governance of approved AI-assisted engineering practices across teams...
J.P. Morgan
San Francisco, CA
8 days ago
AI Infrastructure: CPU & Storage Tech Lead
$342k
OpenAI is looking for a CPU & Storage Technical Lead to define and drive the architecture strategy for its Stargate infrastructure. This role entails owning technical direction... ..., and leading integration into large-scale AI clusters. The ideal candidate will have a Bachelor...
OpenAI
San Francisco, CA
3 days ago
Senior Lead Infrastructure Engineer
...performing team that delivers infrastructure and performance excellence.... ...influential companies. As a Lead Infrastructure Engineer at JPMorgan... ...the Infrastructure Platform Compute Platform Network Services,... ...Uses enterprise‑authorized AI capabilities within the work...
慨正橡扯
San Francisco, CA
21 hours ago
Sr Lead Infrastructure Engineer — Hyper-V Virtualization & Infrastructure Engineering
...'s most influential companies. As a Senior Lead Infrastructure Engineering at JPMorgan Chase within Enterprise Technology, Compute Platform Engineering team, you will engineer... ...platform quality. Uses enterprise‑authorized AI capabilities within the work environment to...
JPMorgan Chase & Co.
San Francisco, CA
3 days ago
Software Engineer, Compute Remote - United States
$180k - $280k
...Vercel gives developers the tools and cloud infrastructure to build, scale, and secure a faster,... .... As the team behind v0, Next.js, and AI SDK, Vercel helps customers like Ramp, Supreme... ...building a platform that powers all of compute at Vercel. That means we provide all the...
Work at office
Remote work
Work from home
Monday to Friday
Flexible hours
Vercel Corp
San Francisco, CA
4 days ago
Senior Software Engineer, Compute
...important — than ever, with AI enabling fraudsters to launch... ...able to serve a wide range of leading companies. For example, Reddit... ...join us! About the role The Compute team's mission: any engineer,... ...production scale with no meaningful infrastructure knowledge required. We build...
Full time
For contractors
Internship
Persona
San Francisco, CA
4 days ago
Senior Software Engineer, Infrastructure
...About Us At Hayden AI, we are on a mission to harness the power of computer vision to transform the way transit systems... ...address real-world challenges. The Infrastructure Engineering team is crucial to... ...Architect the Service Backbone: Lead the design and evolution of the...
Shift work
Hayden AI
San Francisco, CA
21 hours ago
Software Engineer, Infrastructure
...human customer experiences with AI. We are primarily an in-... ...As a Software Engineer, Infrastructure at Sierra, you will be responsible... ...platform health and performance. Lead and participate in incident... ...in production. Degree in Computer Science or related field, or...
Full time
Flexible hours
Sierra
San Francisco, CA
4 days ago
Infrastructure Software Engineer
...Normal Computing | Incredible Opportunities The Normal Team builds foundational software... ...supporting the semiconductor industry, critical AI infrastructure, and the broader systems that power our... ..., and production infrastructure. Lead design discussions for core runtime and...
Normal Computing Corporation
San Francisco, CA
4 days ago
Senior Software Engineer, Infrastructure
$194k - $239k
...Senior Software Engineer, Infrastructure Hover helps people design, improve... ...they love. With proprietary AI built on over a decade of... ...accountability, and excellence. Backed by leading investors like Google... ...deals heavily with 3D data, computer vision, and machine learning....
Full time
For contractors
Work at office
Local area
Flexible hours
HOVER Inc.
San Francisco, CA
21 hours ago
Senior Software Engineer, ML Infrastructure
...Voxel is building the future of Computer Vision and Machine Learning... .... We use computer vision and AI to enable existing security... ...Based in SF, backed by industry-leading VCs. About the Role... ...software engineer to own the ML Infrastructure that powers how Voxel trains...
Work at office
Flexible hours
Voxel Labs
San Francisco, CA
4 days ago
SENIOR SOFTWARE ENGINEER, INFRASTRUCTURE
$200k - $280k
...SENIOR SOFTWARE ENGINEER, INFRASTRUCTURE Build the core infrastructure that... ...and delivery for the world's leading robotics companies.... ...infrastructure layer for physical AI. Our customers include the leading... ...Experience with robotics, computer vision, or ML infrastructure...
Home office
Gerra Group
San Francisco, CA
4 days ago
Software Engineer, Infrastructure
...About Runloop Runloop.ai is building the foundational infrastructure for the next generation of AI development. We provide AI engineers and data... .... Qualifications ~ Bachelor's degree in Computer Science or a related field, or equivalent experience....
Work at office
Work from home
1 day per week
Runloop AI, Inc
San Francisco, CA
21 hours ago
Software Engineer, Infrastructure
...Lightfield AI-native CRM Lightfield is an AI-native CRM that... ..., and scaling the core infrastructure and systems powering Lightfield... ...the ability to ramp quickly on tech stack that features TypeScript... ..., with a degree in Computer Science or a related field....
Work from home
LIGHTFIELD INC
San Francisco, CA
1 day ago
Senior Software Engineer, Infrastructure
$200k - $325k
...Who We Are Serval is an AI-native automation platform transforming... ...moving. We’re backed by leading investors including Sequoia Capital... .... As a Software Engineer, Infrastructure, you’ll build and scale the... ...performance, including compute, storage, networking, and database...
Flexible hours
Dormont Manufacturing Company
San Francisco, CA
4 days ago
Software Engineer, Compute - Storage
About the Team Storage Infrastructure provides APIs for data access, placement, and lifecycle management... ..., and IOPs satisfy the needs of our AI researchers. Scalability, reliability,... ...offer of employment: protect computer hardware entrusted to you from theft, loss...
OpenAI
San Francisco, CA
3 days ago
Compiler Tech Lead
...Role: Compiler Tech Lead Location: SF Bay Area / Toronto | Full-time | Hybrid Compensation... ...-driven startup building sustainable AI infrastructure. The team is creating a high-... ...fundamentals and MLIR. Basic understanding of computer architecture. Compensation & Perks...
Full time
Amadeus Search
San Francisco, CA
4 days ago
Senior Software Engineer, Core Infrastructure
$250k - $330k
...About Decagon Decagon is the leading conversational AI platform empowering every brand to deliver... ...grow as a team. About the Team The Infrastructure team builds and operates the foundations... ...cloud stack—networking, compute, storage, security, and infrastructure...
Work at office
Decagon
San Francisco, CA
21 hours ago
Senior Software Engineer, Core Infrastructure
$200k - $400k
...About Decagon Decagon is the leading conversational AI platform empowering every brand to deliver... ...a team. About the Team The Infrastructure team builds and operates the... ...foundational cloud stack-networking, compute, storage, security, and infrastructureascode...
Full time
Work at office
Local area
Decagon
San Francisco, CA
3 days ago
Software Engineer, Payments Infrastructure
What you’ll do The Payments Infrastructure team builds the trust boundary between a live conversation... ...a problem once per customer. Degree in Computer Science or a related field, or... ...working to bring the transformative power of AI to every organization in the world. To...
Full time
Flexible hours
慨正橡扯
San Francisco, CA
3 days ago
Software Engineer (Machine Learning Infrastructure)
...entrepreneurial engineers eager to shape the future of AI and ML at Whatnot. You’ll design and scale the core infrastructure that powers machine learning and self-hosted... ...and algorithms, plus: Bachelor’s degree in Computer Science, Statistics, Applied Mathematics or a...
Work experience placement
Work at office
Local area
Remote work
Work from home
Home office
Flexible hours
SwiftCruit
San Francisco, CA
21 hours ago

Do you want to receive more vacancies?

Subscribe and receive similar vacancies to Tech Lead, AI Compute Infrastructure. Be the first to apply!