Tech Lead, AI Compute Infrastructure
HeyGen
Tech Lead, AI Compute Infrastructure
Los Angeles, Palo Alto, San Francisco, Toronto, Singapore
About HeyGen
At HeyGen, our mission is to make visual storytelling accessible to all. Over the last decade, visual content has become the preferred method of information creation, consumption, and retention. But the ability to create such content, in particular videos, continues to be costly and challenging to scale. Our ambition is to build technology that equips more people with the power to reach, captivate, and inspire audiences.
We are seeking a seasoned Technical Leader to build and scale the foundational compute infrastructure that powers our state-of-the-art AI models—from multimodal training data pipelines to high-throughput, low-latency video generation.
Responsibilities
You will be the core engineer responsible for building the robust, efficient, and scalable platform that enables our research and production teams to rapidly iterate on HeyGen's generative video models. Your contributions will directly impact model performance, developer productivity, and the final quality of every AI-generated video.
Optimize GPU Utilization: Design and implement mechanisms to aggressively optimize GPU and cluster utilization across thousands of devices for inference, training, data processing and large-scale deployment of our state-of-art video generation models.
Develop Large-Scale AI Job Framework: Build highly scalable, reliable frameworks for launching and managing massive, heterogeneous compute jobs, including multi-modal high-volume data ingestion/processing, distributed model training, and continuous evaluation/benchmarking.
Enhance Observability: Develop world-class observability, tracing, and visualization tools for our compute cluster to ensure reliability, diagnose performance bottlenecks (e.g., memory, bandwidth, communication).
Accelerate Pipelines: Collaborate closely with AI researchers and AI engineers to integrate innovative acceleration techniques (e.g., custom CUDA kernels, distributed training libraries) into production-ready, scalable training and inference pipelines.
Infrastructure Management: Champion the adoption and optimization of modern cloud and container technologies (Kubernetes, Ray) for elastic, cost-efficient scaling of our distributed systems.
Minimum Requirements
Bachelor's degree in Computer Science, Engineering, or a related field, or equivalent practical experience.
5+ years of full-time industry experience in large-scale MLOps, AI infrastructure, or HPC systems.
Experience with data frameworks and standards like Ray, Apache Spark, LanceDB
Strong proficiency in Python and a high-performance language such as C++ for developing core infrastructure components.
Deep understanding and hands-on experience with modern orchestration and distributed computing frameworks such as Kubernetes and Ray.
Experience with core ML frameworks such as PyTorch, TensorFlow, or JAX.
Preferred Qualifications
Master's or PhD in Computer Science or a related technical field.
Demonstrated Tech Lead experience, driving projects from conceptual design through to production deployment across cross-functional teams.
Prior experience building infrastructure specifically for Generative AI models (e.g., diffusion models, GANs, or large language models) where cost and latency are critical.
Proven background in building and operating large-scale data infrastructure (e.g., Ray, Apache Spark) to manage petabytes of multi-modal data (video, audio, text).
Expertise in GPU acceleration and deep familiarity with low-level compute programming, including CUDA, NCCL, or similar technologies for efficient inter-GPU communication.
What HeyGen Offers
- Competitive salary and benefits package.
- Dynamic and inclusive work environment.
- Opportunities for professional growth and advancement.
- Collaborative culture that values innovation and creativity.
- Access to the latest technologies and tools.
HeyGen is an Equal Opportunity Employer. We celebrate diversity and are committed to creating an inclusive environment for all employees.
$342k
...the unique demands of advanced AI workloads. The team is... ...for OpenAI's supercomputing infrastructure, the team also creates custom... ...We are seeking a Technical Lead to lead deployment and operations... ...offer of employment: protect computer hardware entrusted to you from...Suggested$164.2k - $205.2k
Senior Software Engineer, Compute Infrastructure RDQ427R175 Overview At Databricks, we are passionate... ...and running the world's best data and AI infrastructure platform so our... ...engineering excellence and platform mindset. Lead cross‑team initiatives that span product...SuggestedLocal area- About the Team We build and scale the Compute foundation that powers frontier AI research and products. Our team delivers reliable, efficient, and cost... ...GPU fleets in the world, rapidly bringing new infrastructure online across a wide range of providers, hardware types...Suggested
- AI Chopping Block, Inc. is seeking engineers to build and operate the next generation of compute infrastructure. You will handle large-scale clusters and high-performance networks while solving real-time operational challenges. Ideal candidates have experience in distributed...Suggested
- ...eager to shape the future of AI and ML at Whatnot. You'll lead the development and scaling of the core infrastructure that powers machine... ...people using it. As our next Tech Lead Manager, ML Platform you... ...~ Bachelor's degree in Computer Science, Statistics, Applied...SuggestedWork experience placementWork at officeLocal areaRemote workWork from homeHome office
- About Lightfield Lightfield is an AI-native CRM that assembles itself from your email, calendar, and meetings. It captures... ...Salesforce. About the job Lightfield is seeking a hands‑on Infrastructure Tech Lead to help scale the platform through a period of rapid growth...Immediate startWork from home
$342k
OpenAI is looking for a CPU & Storage Technical Lead to define and drive the architecture strategy for its Stargate infrastructure. This role entails owning technical direction... ..., and leading integration into large-scale AI clusters. The ideal candidate will have a Bachelor...- ...mission is to build the next generation of AI: ubiquitous, interactive intelligence... ...models and experiences. We’re funded by leading investors at Index Ventures and... ...models, and we are looking for a TLM, Data Infrastructure to own the strategy and execution for all...Work at office
$216k - $324k
Senior Lead Software Engineer - Developer Infrastructure At Klaviyo, we value the unique backgrounds... ...Quality Tools & Testing and AI Enablement to ensure a... ...development, strong knowledge of computer science fundamentals,... ..., Airflow, and other tech from the big data stack Typescript...$150k - $170k
...prioritizes research in areas poised for impact including AI and advanced computing, astrophysics, biosciences, climate, and space—as well as... ...has a unique advantage. By supporting enabling infrastructure, foundational research, and targeted programs in science...Local area$230k - $385k
About the Team The Storage Infrastructure team builds and operates the storage foundation behind... ...About OpenAI OpenAI is an AI research and deployment company dedicated... ...conditional offer of employment: protect computer hardware entrusted to you from theft, loss...- A tech-driven AI company in San Francisco is seeking a TLM, Data Infrastructure to lead the strategy for managing datasets crucial for their groundbreaking models. The role involves managing a team of data engineers, designing scalable data pipelines for various data types...
- A leading technology firm in San Francisco is seeking a TLM, Data Infrastructure to lead data strategy and execution. The successful candidate will manage a team, design data... ..., and ensure data quality for innovative AI research. Candidates should have technical expertise...
- ...to democratize access to cutting‑edge AI infrastructure previously reserved for hyperscalers. What... ...into a global platform, connecting leading AI labs, data centers, and cloud providers... ...establish a global marketplace for AI compute—powering AGI with the same fluidity as...Full timeRemote work
$148.1k - $250k
...how work gets done. Airtable’s infrastructure is evolving to meet the needs... ...usage, and vertical scaling. Compute: The compute pod builds and... ...Airtable, including all new AI services such as vector databases... ...do Proactively identify and lead significant improvements to...For contractorsWork at officeRemote workRelocationFlexible hours- ...Applied AI Lab Job Compensation: Competitive base salary... ...and security for multi-tenant compute. What You'll Do Design... ..., multi-tenant container infrastructure with fast startup and smart autoscaling... ..., logs) with clear SLOs; lead incident response....Remote work
- ...Exa Infrastructure Engineer Exa is building a search engine from scratch to serve every AI agent. We build massive-scale infrastructure to crawl the web, train state-of-the... ...databases in rust to search over it. If you like compute, we also own a $5M H200 GPU cluster (and...H1b
$230k
...The Fleet team at OpenAI supports the computing environment that powers our cutting-edge... ...prioritize safety, reliability, and responsible AI deployment over unchecked growth.... ...health and efficiency of our supercomputing infrastructure. Our team empowers strong engineers...- ...are hiring Software Engineers focused on AI Infrastructure to build the systems that enable... ...usability. Qualifications Degree in Computer Science, Engineering, or comparable combination... ...and brand at the forefront of fashion-tech innovation. Your design work will...InternshipImmediate start
$190k - $250k
...This is a job that Jill, our AI Recruiter, is recruiting for on behalf of... ...Job Title: Software Engineer (Infrastructure) Salary: $190K – $250K + Equity... ...rapidly expanding global usage and compute-heavy AI workloads. Lead technical deployments for large enterprise...- ...About Us At Hayden AI, we are on a mission to harness the power of computer vision to transform the way transit systems... ...real-world challenges. The Infrastructure Engineering team is crucial to... ...Architect the Service Backbone: Lead the design and evolution of the...Shift work
$209k - $240k
...We Are Notion is the collaborative AI workspace where teams and agents think together... ...think and execute. About the Product Infrastructure Team The Product Infrastructure... ...Zanzibar by Google). You've heard of computing pioneers like Ada Lovelace, Douglas...Local area- ...Who We Are Serval is an AI-native automation platform transforming... ...moving. We're backed by leading investors including Sequoia... .... As a Software Engineer, Infrastructure, you'll build and scale the... ...performance, including compute, storage, networking, and database...
$255k - $405k
...-to-week. Supporting that pace requires infrastructure that can handle real production constraints... ...About OpenAI OpenAI is an AI research and deployment company dedicated... ...conditional offer of employment: protect computer hardware entrusted to you from theft, loss...Contract workShift work$160k - $220k
...Senior Software Engineer - Infrastructure As a Senior Software Engineer... ...development workflow including AI assistant tools, language... ...production, and mission contexts Lead initiatives that improve... ...in a related discipline (e.g. Computer Science, Information Technology...Permanent employmentRemote workFlexible hours- ...Software Engineer Runloop.ai is building the foundational infrastructure for the next generation of AI development. We provide AI engineers and data... ...technologies. Qualifications ~ Bachelor's degree in Computer Science or a related field, or equivalent experience....Work at officeWork from home1 day per week
$150k - $250k
...Foundry Robotics is building an AI-native robotics manufacturing... ...production capability for leading robotics companies and national... ...the backend systems and infrastructure that power the factory of the... ...infrastructure across multiple compute environments You will build...Full timeContract work- ...is a research lab working on AI to unlock biology. Our models... ...obsessed with building systems and infrastructure that are as simple as... ...resilient. You will build the compute and infrastructure systems underpin... ...that might prevent leading biopharma organizations from...Flexible hours
- ...Terraform, Cloud (like AWS) • Degree in computer science (or similar field), and ideally... ...Experience / Misc: • Experience working in Infrastructure / DevOps at one or more of the following... ...growth blue chip startups like Scale AI, Coinbase, Sigma, Linear, etc. o Self-hosted...Work experience placement
$140k - $260k
...Infrastructure Engineer Profound is on a mission to help companies understand and control their AI presence. As an Infrastructure Engineer, you will build and scale the systems that... ...able to handle explosive traffic and compute demands. You will work closely with engineers...Work at officeVisa sponsorship
Do you want to receive more vacancies?
Subscribe and receive similar vacancies to Tech Lead, AI Compute Infrastructure. Be the first to apply!
- technical lead manager San Francisco, CA
- technical leader San Francisco, CA
- salesforce technical lead San Francisco, CA
- technical lead San Francisco, CA
- vice president marketing technology San Francisco, CA
- cardiac tech San Francisco, CA
- technology transfer associate San Francisco, CA
- business technology San Francisco, CA
- monitor tech San Francisco, CA
- technology work from home San Francisco, CA

