Tech Lead, AI Compute Infrastructure
HeyGen
Tech Lead, AI Compute Infrastructure
Los Angeles, Palo Alto, San Francisco, Toronto, Singapore
About HeyGen
At HeyGen, our mission is to make visual storytelling accessible to all. Over the last decade, visual content has become the preferred method of information creation, consumption, and retention. But the ability to create such content, in particular videos, continues to be costly and challenging to scale. Our ambition is to build technology that equips more people with the power to reach, captivate, and inspire audiences.
We are seeking a seasoned Technical Leader to build and scale the foundational compute infrastructure that powers our state-of-the-art AI models—from multimodal training data pipelines to high-throughput, low-latency video generation.
Responsibilities
You will be the core engineer responsible for building the robust, efficient, and scalable platform that enables our research and production teams to rapidly iterate on HeyGen's generative video models. Your contributions will directly impact model performance, developer productivity, and the final quality of every AI-generated video.
Optimize GPU Utilization: Design and implement mechanisms to aggressively optimize GPU and cluster utilization across thousands of devices for inference, training, data processing and large-scale deployment of our state-of-art video generation models.
Develop Large-Scale AI Job Framework: Build highly scalable, reliable frameworks for launching and managing massive, heterogeneous compute jobs, including multi-modal high-volume data ingestion/processing, distributed model training, and continuous evaluation/benchmarking.
Enhance Observability: Develop world-class observability, tracing, and visualization tools for our compute cluster to ensure reliability, diagnose performance bottlenecks (e.g., memory, bandwidth, communication).
Accelerate Pipelines: Collaborate closely with AI researchers and AI engineers to integrate innovative acceleration techniques (e.g., custom CUDA kernels, distributed training libraries) into production-ready, scalable training and inference pipelines.
Infrastructure Management: Champion the adoption and optimization of modern cloud and container technologies (Kubernetes, Ray) for elastic, cost-efficient scaling of our distributed systems.
Minimum Requirements
Bachelor's degree in Computer Science, Engineering, or a related field, or equivalent practical experience.
5+ years of full-time industry experience in large-scale MLOps, AI infrastructure, or HPC systems.
Experience with data frameworks and standards like Ray, Apache Spark, LanceDB
Strong proficiency in Python and a high-performance language such as C++ for developing core infrastructure components.
Deep understanding and hands-on experience with modern orchestration and distributed computing frameworks such as Kubernetes and Ray.
Experience with core ML frameworks such as PyTorch, TensorFlow, or JAX.
Preferred Qualifications
Master's or PhD in Computer Science or a related technical field.
Demonstrated Tech Lead experience, driving projects from conceptual design through to production deployment across cross-functional teams.
Prior experience building infrastructure specifically for Generative AI models (e.g., diffusion models, GANs, or large language models) where cost and latency are critical.
Proven background in building and operating large-scale data infrastructure (e.g., Ray, Apache Spark) to manage petabytes of multi-modal data (video, audio, text).
Expertise in GPU acceleration and deep familiarity with low-level compute programming, including CUDA, NCCL, or similar technologies for efficient inter-GPU communication.
What HeyGen Offers
- Competitive salary and benefits package.
- Dynamic and inclusive work environment.
- Opportunities for professional growth and advancement.
- Collaborative culture that values innovation and creativity.
- Access to the latest technologies and tools.
HeyGen is an Equal Opportunity Employer. We celebrate diversity and are committed to creating an inclusive environment for all employees.
$230k - $405k
...About the Team Compute Infrastructure builds the platform that turns enormous amounts of compute into a reliable engine for frontier AI. We design, provision, schedule, operate, and optimize the systems that connect accelerators, CPUs, networks, storage, data centers,...Suggested$164.2k - $205.2k
...Position Overview At Databricks, the Compute Infrastructure organization builds and operates the foundation that runs all Data, AI, and stateful workloads across all major clouds... ...engineering excellence and platform mindset. Lead cross‑team initiatives that span product...SuggestedLocal area$342k
...the unique demands of advanced AI workloads. The team is... ...for OpenAI's supercomputing infrastructure, the team also creates custom... ...We are seeking a Technical Lead to lead deployment and operations... ...offer of employment: protect computer hardware entrusted to you from...Suggested$190k - $250k
...Staff Software Engineer / Tech Lead, ML Infrastructure Heartflow is a medical technology company advancing... ...technology. The flagship product—an AI-driven, non-invasive cardiac test... ...and maintain large-scale distributed computing platforms for ML training and evaluation...SuggestedFull timeWork at officeLocal areaWorldwideRelocation$164.2k - $205.2k
Senior Software Engineer, Compute Infrastructure RDQ427R175 Overview At Databricks, we are passionate... ...and running the world's best data and AI infrastructure platform so our... ...engineering excellence and platform mindset. Lead cross‑team initiatives that span product...SuggestedLocal area- ...Omnifold is seeking an Infrastructure Tech Lead / Principal Engineer in San Francisco to own model deployment, security, and cloud resource management. This role demands expertise in cloud computing and a strong Computer Science background. You will enhance system monitoring...
- Infrastructure Tech Lead / Principal Engineer Omnifold trains custom AI models that help planners forecast the future. We are hiring our first infrastructure tech lead... ...What we’re looking for Experience with cloud computing (especially GPU workloads), CI/CD infrastructure...
- About Lightfield Lightfield is an AI-native CRM that assembles itself from your email, calendar, and meetings. It captures... ...Salesforce. About the job Lightfield is seeking a hands‑on Infrastructure Tech Lead to help scale the platform through a period of rapid growth...Immediate startWork from home
$160k - $210k
...Inc. in San Francisco is searching for an experienced Tech Lead to oversee the Core Infrastructure team. This role will involve managing Zip’s Kubernetes... ...$160,000 - $210,000 and opportunities to develop within a cutting-edge AI platform. #J-18808-Ljbffr ZipHQ, Inc.- ...Cartesia Our mission is to architect AI that learns from and interacts with the... ...and experiences. We're funded by leading investors at Index Ventures and Lightspeed... ...models, and we are looking for a TLM, Data Infrastructure to own the strategy and execution for...Work at officeVisa sponsorshipFlexible hours
$216k - $324k
...Senior Lead Software Engineer - Developer Infrastructure At Klaviyo, we value the unique backgrounds... ...Quality Tools & Testing and AI Enablement to ensure a... ...development, strong knowledge of computer science fundamentals,... ..., Airflow, and other tech from the big data stack Typescript...$342k
OpenAI is looking for a CPU & Storage Technical Lead to define and drive the architecture strategy for its Stargate infrastructure. This role entails owning technical direction... ..., and leading integration into large-scale AI clusters. The ideal candidate will have a Bachelor...- ...mission is to build the next generation of AI: ubiquitous, interactive intelligence... ...models and experiences. We’re funded by leading investors at Index Ventures and... ...models, and we are looking for a TLM, Data Infrastructure to own the strategy and execution for all...Work at office
- ...A tech-driven AI company in San Francisco is seeking a TLM, Data Infrastructure to lead the strategy for managing datasets crucial for their groundbreaking models. The role involves managing a team of data engineers, designing scalable data pipelines for various data types...
$255k - $320k
...Communications Lead, Infrastructure and Engineering New York City, NY; San Francisco, CA; Seattle... ...reliable, interpretable, and steerable AI systems. We want AI to be safe and... ...biology as with traditional efforts in computer science. We're an extremely collaborative...Work at officeVisa sponsorshipFlexible hours$255k - $405k
...About the Team The Agent Infrastructure team at OpenAI is responsible... ...deployment of highly useful AI agents, both internally and for... ...capabilities to some of the largest compute clusters in the world. At the... ...with infrastructure-as-code tech like Terraform. Are driven...Work at officeWorldwideRelocation package- ...OpenAI seeks a Senior Manager, Technical Accounting to lead the evaluation and documentation of accounting related to the organization’s compute infrastructure in San Francisco. The role includes analyzing complex transactions under U.S. GAAP and implementing processes...Relocation package
- ...About the Team Storage Infrastructure provides APIs for data access, placement, and lifecycle... ...throughput, and IOPs satisfy the needs of our AI researchers. Scalability, reliability,... ...offer of employment: protect computer hardware entrusted to you from theft, loss...
$150k - $170k
...prioritizes research in areas poised for impact including AI and advanced computing, astrophysics, biosciences, climate, and space—as well as... ...has a unique advantage. By supporting enabling infrastructure, foundational research, and targeted programs in science...Local area$180k - $280k
...Vercel gives developers the tools and cloud infrastructure to build, scale, and secure a faster,... .... As the team behind v0, Next.js, and AI SDK, Vercel helps customers like Ramp, Supreme... ...building a platform that powers all of compute at Vercel. That means we provide all the...Work at officeRemote workWork from homeMonday to FridayFlexible hours- ...important — than ever, with AI enabling fraudsters to launch... ...able to serve a wide range of leading companies. For example, Reddit... ...join us! About the role The Compute team's mission: any engineer,... ...production scale with no meaningful infrastructure knowledge required. We build...Full timeFor contractorsInternship
$230k - $385k
About the Team The Storage Infrastructure team builds and operates the storage foundation behind... ...About OpenAI OpenAI is an AI research and deployment company dedicated... ...conditional offer of employment: protect computer hardware entrusted to you from theft, loss...- ...Overview We\'re looking for a Staff Software Engineer – Computer Vision Deployment to build and scale the infrastructure that powers our AI-driven warehouse intelligence platform. You\'ll own the end-to-end lifecycle of computer vision models — from training pipelines...Work at office3 days per week
- ...About Eventual Every breakthrough Physical AI system — humanoid robots, autonomous... ...with the top PhysicalAI labs and public AI infrastructure companies today. We have raised $30M... ...horizon — on billions of dollars worth of compute, in collaboration with partners that are...Work at officeFlexible hoursNight shift
$150k - $250k
...are looking for a Software Engineer with a focus on Onboard Infrastructure and Drivers to join us and take a key role in designing and... ...across multiple domains to bring together our embedded devices, AI, computing hardware, and sensors to create a highly reliable and...Temporary workWork at officeVisa sponsorshipFlexible hours$166k - $225k
...We do this by building and running the world's best data and AI infrastructure platform so our customers can use deep data insights to... ...and release packaging. What we look for: BS (or higher) in Computer Science, or a related field 5+ years of experience writing production...For contractorsLocal areaWorldwideFlexible hours- ...What you’ll do The Payments Infrastructure team builds the trust boundary between a live conversation... ...a problem once per customer. Degree in Computer Science or a related field, or... ...working to bring the transformative power of AI to every organization in the world. To do...Full timeFlexible hours
$230k
...The Fleet team at OpenAI supports the computing environment that powers our cutting-edge... ...prioritize safety, reliability, and responsible AI deployment over unchecked growth.... ...health and efficiency of our supercomputing infrastructure. Our team empowers strong engineers...- ...Applied AI Lab Job Compensation: Competitive base salary... ...and security for multi-tenant compute. What You'll Do Design... ..., multi-tenant container infrastructure with fast startup and smart autoscaling... ..., logs) with clear SLOs; lead incident response....Remote work
- ...are hiring Software Engineers focused on AI Infrastructure to build the systems that enable... ...usability. Qualifications Degree in Computer Science, Engineering, or comparable combination... ...and brand at the forefront of fashion-tech innovation. Your design work will...InternshipImmediate start
Do you want to receive more vacancies?
Subscribe and receive similar vacancies to Tech Lead, AI Compute Infrastructure. Be the first to apply!
- technical lead manager San Francisco, CA
- technical leader San Francisco, CA
- technical lead San Francisco, CA
- technology summer internship San Francisco, CA
- tax technology analyst San Francisco, CA
- computer tech San Francisco, CA
- ep tech San Francisco, CA
- high tech San Francisco, CA
- sterile processing tech no experience San Francisco, CA
- technology executive San Francisco, CA


