Tech Lead, AI Compute Infrastructure
HeyGen
Tech Lead, AI Compute Infrastructure
Los Angeles, Palo Alto, San Francisco, Toronto, Singapore
About HeyGen
At HeyGen, our mission is to make visual storytelling accessible to all. Over the last decade, visual content has become the preferred method of information creation, consumption, and retention. But the ability to create such content, in particular videos, continues to be costly and challenging to scale. Our ambition is to build technology that equips more people with the power to reach, captivate, and inspire audiences.
We are seeking a seasoned Technical Leader to build and scale the foundational compute infrastructure that powers our state-of-the-art AI models—from multimodal training data pipelines to high-throughput, low-latency video generation.
Responsibilities
You will be the core engineer responsible for building the robust, efficient, and scalable platform that enables our research and production teams to rapidly iterate on HeyGen's generative video models. Your contributions will directly impact model performance, developer productivity, and the final quality of every AI-generated video.
Optimize GPU Utilization: Design and implement mechanisms to aggressively optimize GPU and cluster utilization across thousands of devices for inference, training, data processing and large-scale deployment of our state-of-art video generation models.
Develop Large-Scale AI Job Framework: Build highly scalable, reliable frameworks for launching and managing massive, heterogeneous compute jobs, including multi-modal high-volume data ingestion/processing, distributed model training, and continuous evaluation/benchmarking.
Enhance Observability: Develop world-class observability, tracing, and visualization tools for our compute cluster to ensure reliability, diagnose performance bottlenecks (e.g., memory, bandwidth, communication).
Accelerate Pipelines: Collaborate closely with AI researchers and AI engineers to integrate innovative acceleration techniques (e.g., custom CUDA kernels, distributed training libraries) into production-ready, scalable training and inference pipelines.
Infrastructure Management: Champion the adoption and optimization of modern cloud and container technologies (Kubernetes, Ray) for elastic, cost-efficient scaling of our distributed systems.
Minimum Requirements
Bachelor's degree in Computer Science, Engineering, or a related field, or equivalent practical experience.
5+ years of full-time industry experience in large-scale MLOps, AI infrastructure, or HPC systems.
Experience with data frameworks and standards like Ray, Apache Spark, LanceDB
Strong proficiency in Python and a high-performance language such as C++ for developing core infrastructure components.
Deep understanding and hands-on experience with modern orchestration and distributed computing frameworks such as Kubernetes and Ray.
Experience with core ML frameworks such as PyTorch, TensorFlow, or JAX.
Preferred Qualifications
Master's or PhD in Computer Science or a related technical field.
Demonstrated Tech Lead experience, driving projects from conceptual design through to production deployment across cross-functional teams.
Prior experience building infrastructure specifically for Generative AI models (e.g., diffusion models, GANs, or large language models) where cost and latency are critical.
Proven background in building and operating large-scale data infrastructure (e.g., Ray, Apache Spark) to manage petabytes of multi-modal data (video, audio, text).
Expertise in GPU acceleration and deep familiarity with low-level compute programming, including CUDA, NCCL, or similar technologies for efficient inter-GPU communication.
What HeyGen Offers
- Competitive salary and benefits package.
- Dynamic and inclusive work environment.
- Opportunities for professional growth and advancement.
- Collaborative culture that values innovation and creativity.
- Access to the latest technologies and tools.
HeyGen is an Equal Opportunity Employer. We celebrate diversity and are committed to creating an inclusive environment for all employees.
$230k - $405k
...About the Team Compute Infrastructure builds the platform that turns enormous amounts of compute into a reliable engine for frontier AI. We design, provision, schedule, operate, and optimize the systems that connect accelerators, CPUs, networks, storage, data centers,...Suggested$164.2k - $205.2k
...Senior Software Engineer, Compute Infrastructure RDQ427R175 Overview At Databricks, we are passionate... ...and running the world's best data and AI infrastructure platform so our customers... ...engineering excellence and platform mindset. Lead cross‑team initiatives that span...SuggestedLocal area$190k - $250k
...Staff Software Engineer / Tech Lead, ML Infrastructure Heartflow is a medical technology company advancing... ...technology. The flagship product—an AI-driven, non-invasive cardiac test... ...and maintain large-scale distributed computing platforms for ML training and evaluation...SuggestedFull timeWork at officeLocal areaWorldwideRelocation$342k
...the unique demands of advanced AI workloads. The team is... ...for OpenAI’s supercomputing infrastructure, the team also creates custom... ...Role We are seeking a Technical Lead to lead deployment and operations... ...offer of employment: protect computer hardware entrusted to you...Suggested$164.2k - $205.2k
Senior Software Engineer, Compute Infrastructure RDQ427R175 Overview At Databricks, we are passionate... ...and running the world's best data and AI infrastructure platform so our... ...engineering excellence and platform mindset. Lead cross‑team initiatives that span product...SuggestedLocal area- ...Omnifold is seeking an Infrastructure Tech Lead / Principal Engineer in San Francisco to own model deployment, security, and cloud resource management. This role demands expertise in cloud computing and a strong Computer Science background. You will enhance system monitoring...
- Infrastructure Tech Lead / Principal Engineer Omnifold trains custom AI models that help planners forecast the future. We are hiring our first infrastructure tech lead... ...What we’re looking for Experience with cloud computing (especially GPU workloads), CI/CD infrastructure...
$160k - $210k
...Inc. in San Francisco is searching for an experienced Tech Lead to oversee the Core Infrastructure team. This role will involve managing Zip’s Kubernetes... ...$160,000 - $210,000 and opportunities to develop within a cutting-edge AI platform. #J-18808-Ljbffr ZipHQ, Inc.- ...Cartesia Our mission is to architect AI that learns from and interacts with the... ...and experiences. We're funded by leading investors at Index Ventures and Lightspeed... ...models, and we are looking for a TLM, Data Infrastructure to own the strategy and execution for...Work at officeVisa sponsorshipFlexible hours
- About Lightfield Lightfield is an AI-native CRM that assembles itself from your email, calendar, and meetings. It captures... ...Salesforce. About the job Lightfield is seeking a hands‑on Infrastructure Tech Lead to help scale the platform through a period of rapid growth...Immediate startWork from home
$216k - $324k
...Senior Lead Software Engineer - Developer Infrastructure At Klaviyo, we value the unique backgrounds... ...Quality Tools & Testing and AI Enablement to ensure a... ...development, strong knowledge of computer science fundamentals,... ..., Airflow, and other tech from the big data stack Typescript...- ...mission is to build the next generation of AI: ubiquitous, interactive intelligence... ...models and experiences. We’re funded by leading investors at Index Ventures and... ...models, and we are looking for a TLM, Data Infrastructure to own the strategy and execution for all...Work at office
$342k
OpenAI is looking for a CPU & Storage Technical Lead to define and drive the architecture strategy for its Stargate infrastructure. This role entails owning technical direction... ..., and leading integration into large-scale AI clusters. The ideal candidate will have a Bachelor...- ...A tech-driven AI company in San Francisco is seeking a TLM, Data Infrastructure to lead the strategy for managing datasets crucial for their groundbreaking models. The role involves managing a team of data engineers, designing scalable data pipelines for various data types...
$255k - $320k
...Communications Lead, Infrastructure and Engineering New York City, NY; San Francisco, CA; Seattle... ...reliable, interpretable, and steerable AI systems. We want AI to be safe and... ...biology as with traditional efforts in computer science. We're an extremely collaborative...Work at officeVisa sponsorshipFlexible hours$255k - $405k
...About the Team The Agent Infrastructure team at OpenAI is responsible... ...deployment of highly useful AI agents, both internally and for... ...capabilities to some of the largest compute clusters in the world. At the... ...with infrastructure-as-code tech like Terraform. Are driven...Work at officeWorldwideRelocation package- ...OpenAI seeks a Senior Manager, Technical Accounting to lead the evaluation and documentation of accounting related to the organization’s compute infrastructure in San Francisco. The role includes analyzing complex transactions under U.S. GAAP and implementing processes...Relocation package
- ...About the Team Storage Infrastructure provides APIs for data access, placement, and lifecycle... ...throughput, and IOPs satisfy the needs of our AI researchers. Scalability, reliability,... ...offer of employment: protect computer hardware entrusted to you from theft, loss...
$150k - $170k
...prioritizes research in areas poised for impact including AI and advanced computing, astrophysics, biosciences, climate, and space—as well as... ...has a unique advantage. By supporting enabling infrastructure, foundational research, and targeted programs in science...Local area- ...important — than ever, with AI enabling fraudsters to launch... ...able to serve a wide range of leading companies. For example, Reddit... ...join us! About the role The Compute team's mission: any engineer,... ...production scale with no meaningful infrastructure knowledge required. We build...Full timeFor contractorsInternship
$180k - $280k
...Vercel gives developers the tools and cloud infrastructure to build, scale, and secure a faster,... .... As the team behind v0, Next.js, and AI SDK, Vercel helps customers like Ramp, Supreme... ...building a platform that powers all of compute at Vercel. That means we provide all the...Work at officeRemote workWork from homeMonday to FridayFlexible hours$230k - $385k
About the Team The Storage Infrastructure team builds and operates the storage foundation behind... ...About OpenAI OpenAI is an AI research and deployment company dedicated... ...conditional offer of employment: protect computer hardware entrusted to you from theft, loss...- ...About Eventual Every breakthrough Physical AI system — humanoid robots, autonomous... ...with the top PhysicalAI labs and public AI infrastructure companies today. We have raised $30M... ...horizon — on billions of dollars worth of compute, in collaboration with partners that are...Work at officeFlexible hoursNight shift
- ...Overview We\'re looking for a Staff Software Engineer – Computer Vision Deployment to build and scale the infrastructure that powers our AI-driven warehouse intelligence platform. You\'ll own the end-to-end lifecycle of computer vision models — from training pipelines...Work at office3 days per week
$166k - $225k
...We do this by building and running the world's best data and AI infrastructure platform so our customers can use deep data insights to... ...and release packaging. What we look for: BS (or higher) in Computer Science, or a related field 5+ years of experience writing production...For contractorsLocal areaWorldwideFlexible hours$150k - $250k
...are looking for a Software Engineer with a focus on Onboard Infrastructure and Drivers to join us and take a key role in designing and... ...across multiple domains to bring together our embedded devices, AI, computing hardware, and sensors to create a highly reliable and...Temporary workWork at officeVisa sponsorshipFlexible hours$209k - $240k
...We Are Notion is the collaborative AI workspace where teams and agents think together... ...their life's work. About the Product Infrastructure Team The Product Infrastructure... ...Zanzibar by Google). You've heard of computing pioneers like Ada Lovelace, Douglas...Local area- ...to democratize access to cutting‑edge AI infrastructure previously reserved for hyperscalers. What... ...into a global platform, connecting leading AI labs, data centers, and cloud providers... ...establish a global marketplace for AI compute—powering AGI with the same fluidity as...Full timeRemote work
$255k
...engineers to operate the next generation of compute clusters that power OpenAI's frontier... ...systems engineering with hands-on infrastructure work on our largest datacenters. You will... ...performance computing About OpenAI OpenAI is an AI research and deployment company...- ...inference for the world's most dynamic AI companies, like Cursor, Notion, OpenEvidence... ...uniting applied AI research, flexible infrastructure, and seamless developer tooling, we... ...REQUIREMENTS Bachelor's degree or higher in Computer Science or related field 1-3 years experience...Flexible hours
Do you want to receive more vacancies?
Subscribe and receive similar vacancies to Tech Lead, AI Compute Infrastructure. Be the first to apply!
- technical lead manager San Francisco, CA
- technical leader San Francisco, CA
- technical lead San Francisco, CA
- technology summer internship San Francisco, CA
- tax technology analyst San Francisco, CA
- computer tech San Francisco, CA
- ep tech San Francisco, CA
- high tech San Francisco, CA
- sterile processing tech no experience San Francisco, CA
- technology executive San Francisco, CA


