Tech Lead, AI Compute Infrastructure
HeyGen
Tech Lead, AI Compute Infrastructure
Los Angeles, Palo Alto, San Francisco, Toronto, Singapore
About HeyGen
At HeyGen, our mission is to make visual storytelling accessible to all. Over the last decade, visual content has become the preferred method of information creation, consumption, and retention. But the ability to create such content, in particular videos, continues to be costly and challenging to scale. Our ambition is to build technology that equips more people with the power to reach, captivate, and inspire audiences.
We are seeking a seasoned Technical Leader to build and scale the foundational compute infrastructure that powers our state-of-the-art AI models—from multimodal training data pipelines to high-throughput, low-latency video generation.
Responsibilities
You will be the core engineer responsible for building the robust, efficient, and scalable platform that enables our research and production teams to rapidly iterate on HeyGen's generative video models. Your contributions will directly impact model performance, developer productivity, and the final quality of every AI-generated video.
Optimize GPU Utilization: Design and implement mechanisms to aggressively optimize GPU and cluster utilization across thousands of devices for inference, training, data processing and large-scale deployment of our state-of-art video generation models.
Develop Large-Scale AI Job Framework: Build highly scalable, reliable frameworks for launching and managing massive, heterogeneous compute jobs, including multi-modal high-volume data ingestion/processing, distributed model training, and continuous evaluation/benchmarking.
Enhance Observability: Develop world-class observability, tracing, and visualization tools for our compute cluster to ensure reliability, diagnose performance bottlenecks (e.g., memory, bandwidth, communication).
Accelerate Pipelines: Collaborate closely with AI researchers and AI engineers to integrate innovative acceleration techniques (e.g., custom CUDA kernels, distributed training libraries) into production-ready, scalable training and inference pipelines.
Infrastructure Management: Champion the adoption and optimization of modern cloud and container technologies (Kubernetes, Ray) for elastic, cost-efficient scaling of our distributed systems.
Minimum Requirements
Bachelor's degree in Computer Science, Engineering, or a related field, or equivalent practical experience.
5+ years of full-time industry experience in large-scale MLOps, AI infrastructure, or HPC systems.
Experience with data frameworks and standards like Ray, Apache Spark, LanceDB
Strong proficiency in Python and a high-performance language such as C++ for developing core infrastructure components.
Deep understanding and hands-on experience with modern orchestration and distributed computing frameworks such as Kubernetes and Ray.
Experience with core ML frameworks such as PyTorch, TensorFlow, or JAX.
Preferred Qualifications
Master's or PhD in Computer Science or a related technical field.
Demonstrated Tech Lead experience, driving projects from conceptual design through to production deployment across cross-functional teams.
Prior experience building infrastructure specifically for Generative AI models (e.g., diffusion models, GANs, or large language models) where cost and latency are critical.
Proven background in building and operating large-scale data infrastructure (e.g., Ray, Apache Spark) to manage petabytes of multi-modal data (video, audio, text).
Expertise in GPU acceleration and deep familiarity with low-level compute programming, including CUDA, NCCL, or similar technologies for efficient inter-GPU communication.
What HeyGen Offers
- Competitive salary and benefits package.
- Dynamic and inclusive work environment.
- Opportunities for professional growth and advancement.
- Collaborative culture that values innovation and creativity.
- Access to the latest technologies and tools.
HeyGen is an Equal Opportunity Employer. We celebrate diversity and are committed to creating an inclusive environment for all employees.
$164.2k - $205.2k
...and running the world's best data and AI infrastructure platform so our customers can use deep... ...getting started. At Databricks, the Compute Infrastructure organization builds and... ...engineering excellence and platform mindset. Lead cross-team initiatives that span...SuggestedLocal areaWorldwide$166k - $244k
Senior Software Engineer, AI/ML GenAI, Google Cloud Compute Infrastructure Google Sunnyvale, CA, USA Apply Bachelor’s degree or equivalent practical experience. 5 years of experience programming in Python or C++. 3 years of experience with ML infrastructure (e.g.,...SuggestedFull time- ...Fortanix we are pioneers in confidential computing and Confidential AI for hybrid and multicloud... ...and data across clouds, on-premises infrastructure, and devices. Our platform enables... ...protections. We partner closely with leading cloud and silicon providers and bring...SuggestedH1b
$248k - $391k
...NVIDIA has been reinventing computer graphics, PC gaming, and accelerated... ...into the unlimited potential of AI to define the next era of computing... ...optimizing the performance of our infrastructure both on-prem and in the cloud. You will lead the architectural vision for a massive...Suggested$235.03k - $352.29k
...Technical Lead Manager, ML Platform Infrastructure Mountain View, California (HQ) Nuro is a self-driving technology... ...driver, combining cutting-edge AI with automotive-grade hardware. Nuro... ...have seamless access to the compute and data resources required to build...Suggested$214k - $295k
...Staff Software Engineer, Data Infrastructure, AI Compute Platform Redwood City, CA (Hybrid) Biohub is the first large-scale initiative bringing frontier AI models, massive compute, and frontier experimental capabilities under one roof. We're building a general-purpose...Work at officeWorldwideRelocation packageFlexible hours3 days per week$115k - $210k
...place their items on our kiosks and our AI rings up their entire order in less than... ...Summary We're looking for a backend infrastructure developer to help us build the software... ...coding experience ~ B.S. or higher in Computer Science (or equivalent work experience)...Temporary workWork experience placementWork at officeImmediate startFlexible hours$140k - $300k
...will play a critical role in supporting Tesla's AI hardware initiatives by developing automation, infrastructure, and services. Join a dynamic team of engineers... ...collaboration with AI HW design teams and High-Performance Computing (HPC) groups. Your primary focus will be...Hourly payFull timeTemporary workFlexible hours$118k - $390k
...What to Expect As a Software Engineer within the Autopilot AI Infrastructure team, you will work on reinforcing, optimizing, and scaling... ...profiling and optimizing CPU-GPU interactions (pipelining computation with data transfers, etc.) Proficient in system-level software...Hourly payFull timeTemporary workFlexible hours$160.36k - $240.54k
...Software Engineer, Onboard Infrastructure Mountain View, California (HQ) Nuro is a self-... ...scalable driver, combining cutting-edge AI with automotive-grade hardware. Nuro licenses... ...'s onboard software for our sensor and compute platform, including device drivers,...- ...expertise across connectivity, AI, security and more, we'... ...our team to own test infrastructure, internal tooling, CI/... ..., and program leads to define strategy, implement... ...or Master's degree in Computer Science, Software/... ...more information on RV Tech's comprehensive benefits...Full timeContract work
- ...Senior Software Engineer - Test Infrastructure Latitude AI develops automated driving technologies,... ...Latitude team, you'll work alongside leading experts across machine learning and robotics... ...platforms, mapping, sensors and compute systems, test operations, systems and...Work at officeImmediate start
- ...global market leader, bringing innovative AI-enhanced technology to over 8,100... ...better for everyone. ~ Build out core infrastructure services and microservices that impact our... ...challenging projects quickly. ~ BS in computer science or a related field. ~ High level...Work at officeImmediate startRemote workFlexible hours
$152k - $228k
...s most scalable driver, combining cutting-edge AI with automotive-grade hardware. Nuro licenses its... ...for real-time performance on actual robot compute hardware before it reaches the road. You will own the infrastructure that makes this possible. Our Performance Simulation...Temporary work$168k - $230k
...SR. SECURITY SOFTWARE ENGINEER, APPLIED COMPUTING (STARSHIELD) Starshield leverages SpaceX... ...Software Engineer, you will leverage AI to automate security-related efforts and... ...discover and fix security issues in Starshield infrastructure and systems Provide guidance and...Permanent employmentTemporary workImmediate startFlexible hoursWeekend work$132k - $198k
...driver, combining cutting-edge AI with automotive-grade... ...maintain release and (OTA) update infrastructure. Our team, Fleet connectivity... ...fleets. Our engineers work on the tech stack across the cloud and... .... ~ Bachelor's degree in Computer Science, Electrical Engineering...$160.36k - $240.54k
...Software Engineer, Offboard Infrastructure Mountain View, California (HQ) Nuro is a self... ...scalable driver, combining cutting-edge AI with automotive-grade hardware. Nuro licenses... ...engineering organizations: generic compute platform to host mission-critical workflows...$174k - $252k
Senior Software Engineer, AI/ML, AI and Infrastructure Apply X Note: By applying to this position you will have an opportunity to share your preferred... ...). Preferred qualifications: Master's degree or PhD in Computer Science or related technical field. 5 years of experience...Full timeWorldwide$236k - $309.75k
...enterprise. To usher in this new era, we seek AI-native thinkers across every function... ...data. This role focuses on the backend infrastructure that powers our flagship products like... ...: Education: Bachelor's degree in Computer Science or a related technical field....Flexible hours- ...Backend/Infrastructure Engineer At Simular, we're building the next generation of computer user agents - AI systems that can actually use your computer for you. Our backend powers... ...it's time to split/refactor services and lead that evolution. Explore new directions...
$132.3k - $198.45k
...Software Engineer, Distributed Compute System Mountain View, California (HQ) Who... ...scalable driver, combining cutting-edge AI with automotive-grade hardware. Nuro licenses... .../scale Nuro's large-scale computing infrastructure in the cloud/data center. This system is...$190k - $220k
...Lightmatter is leading the revolution in AI data center infrastructure, enabling the next giant leaps in human progress. The company invented the world's first... .... Lightmatter is (re)inventing the future of computing with light! In this role, you will lead the development...Full timeTemporary workFlexible hours$154.3k - $231.5k
...centralizing the management of Infrastructure, Technology, and Data. The IT... ...engineering teams to spin up compute, network, and storage... .../SLAs for platform services; lead blameless post-mortems and drive... ...and Accelerating the World's AI Transformation Rubrik (RBRK...Permanent employmentLocal area$198k - $326k
...needs of the team. As a Sr. Staff Software Engineer of the Compute Infrastructure team at LinkedIn, you will play a crucial role in our... ...container technologies, and systems knowledge. -Experienced in leading technical teams and mentoring other engineers -Experience...For contractorsWork at officeFlexible hours$120k - $300k
...accelerates the global adoption of safe, AI-driven machines. Founded in 2017,... ...every intelligent machine is world-class infrastructure — come help us design it. You will implement... ...someone who has: A Bachelor's degree in Computer Science, Software Engineering, or equivalent...Full timeTemporary workFor contractorsFor subcontractorCasual workWork at officeRemote workDay shift$124k - $420k
...Engineer for the Optimus team, you will build the tools and infrastructure to make and measure improvements to neural network architecture... ...: Profiling and optimizing CPU-GPU interactions (pipelining compute/transfers, etc) Compensation and Benefits Benefits...Hourly payFull timeTemporary workFlexible hours$228.4k - $303.55k
...and running the world's best data and AI infrastructure platform, so our customers can focus on... ...~ Leadership skills and experience to lead across functional and organizational lines... ...-quality solutions ~ MS or Ph.D. in Computer Science or related fields Pay Range...Local areaWorldwide- ...Cloud Infrastructure Engineer At Rhoda AI, we're building the next generation of generalist intelligent robots. We own the full robotics stack from... ...resolve performance bottlenecks across the data and compute stack to meet latency and throughput requirements Partner...
$180k - $240k
...We are seeking a Senior Cloud Infrastructure Engineer to architect and manage the large-scale compute and data infrastructure powering... ...You will be the backbone of our AI platform, ensuring that multi-... ...the path to profitable AVs Tech Brew: Gatik AI exec unpacks the...Odd jobWork at office$185k - $230k
...Senior Software Engineer, Backend (Infrastructure) looking to make a significant... ...web services and cutting-edge AI technologies? We're seeking a talented engineer to lead the development, deployment,... ...Bachelor's, Master's, or Ph.D. in Computer Science or a related field....Permanent employment
Do you want to receive more vacancies?
Subscribe and receive similar vacancies to Tech Lead, AI Compute Infrastructure. Be the first to apply!
- technical leader Palo Alto, CA
- technical lead Palo Alto, CA
- cardiac tech Palo Alto, CA
- business technology Palo Alto, CA
- technology work from home Palo Alto, CA
- electrical tech Palo Alto, CA
- hvac service tech Palo Alto, CA
- travel cath lab tech Palo Alto, CA
- vet tech Palo Alto, CA
- dental lab tech Palo Alto, CA

