Sign up to access all features of our service.
  • Job search
  • Favorites
  • Create a CV
    New
  • Salaries
  • Subscriptions

Tech Lead, AI Compute Infrastructure

HeyGen

Tech Lead, AI Compute Infrastructure

Los Angeles, Palo Alto, San Francisco, Toronto, Singapore

About HeyGen

At HeyGen, our mission is to make visual storytelling accessible to all. Over the last decade, visual content has become the preferred method of information creation, consumption, and retention. But the ability to create such content, in particular videos, continues to be costly and challenging to scale. Our ambition is to build technology that equips more people with the power to reach, captivate, and inspire audiences.

We are seeking a seasoned Technical Leader to build and scale the foundational compute infrastructure that powers our state-of-the-art AI models—from multimodal training data pipelines to high-throughput, low-latency video generation.

Responsibilities

You will be the core engineer responsible for building the robust, efficient, and scalable platform that enables our research and production teams to rapidly iterate on HeyGen's generative video models. Your contributions will directly impact model performance, developer productivity, and the final quality of every AI-generated video.

  • Optimize GPU Utilization: Design and implement mechanisms to aggressively optimize GPU and cluster utilization across thousands of devices for inference, training, data processing and large-scale deployment of our state-of-art video generation models.

  • Develop Large-Scale AI Job Framework: Build highly scalable, reliable frameworks for launching and managing massive, heterogeneous compute jobs, including multi-modal high-volume data ingestion/processing, distributed model training, and continuous evaluation/benchmarking.

  • Enhance Observability: Develop world-class observability, tracing, and visualization tools for our compute cluster to ensure reliability, diagnose performance bottlenecks (e.g., memory, bandwidth, communication).

  • Accelerate Pipelines: Collaborate closely with AI researchers and AI engineers to integrate innovative acceleration techniques (e.g., custom CUDA kernels, distributed training libraries) into production-ready, scalable training and inference pipelines.

  • Infrastructure Management: Champion the adoption and optimization of modern cloud and container technologies (Kubernetes, Ray) for elastic, cost-efficient scaling of our distributed systems.

Minimum Requirements
  • Bachelor's degree in Computer Science, Engineering, or a related field, or equivalent practical experience.

  • 5+ years of full-time industry experience in large-scale MLOps, AI infrastructure, or HPC systems.

  • Experience with data frameworks and standards like Ray, Apache Spark, LanceDB

  • Strong proficiency in Python and a high-performance language such as C++ for developing core infrastructure components.

  • Deep understanding and hands-on experience with modern orchestration and distributed computing frameworks such as Kubernetes and Ray.

  • Experience with core ML frameworks such as PyTorch, TensorFlow, or JAX.

Preferred Qualifications
  • Master's or PhD in Computer Science or a related technical field.

  • Demonstrated Tech Lead experience, driving projects from conceptual design through to production deployment across cross-functional teams.

  • Prior experience building infrastructure specifically for Generative AI models (e.g., diffusion models, GANs, or large language models) where cost and latency are critical.

  • Proven background in building and operating large-scale data infrastructure (e.g., Ray, Apache Spark) to manage petabytes of multi-modal data (video, audio, text).

  • Expertise in GPU acceleration and deep familiarity with low-level compute programming, including CUDA, NCCL, or similar technologies for efficient inter-GPU communication.

What HeyGen Offers
  • Competitive salary and benefits package.
  • Dynamic and inclusive work environment.
  • Opportunities for professional growth and advancement.
  • Collaborative culture that values innovation and creativity.
  • Access to the latest technologies and tools.

HeyGen is an Equal Opportunity Employer. We celebrate diversity and are committed to creating an inclusive environment for all employees.

Vacancy posted 1 day ago
Similar jobs that could be interesting for youBased on the Tech Lead, AI Compute Infrastructure in Palo Alto, CA vacancy
  • $164.2k - $205.2k

     ...and running the world's best data and AI infrastructure platform so our customers can use deep...  ...getting started. At Databricks, the Compute Infrastructure organization builds and...  ...engineering excellence and platform mindset. Lead cross-team initiatives that span... 
    Suggested
    Local area
    Worldwide

    Databricks

    Mountain View, CA
    1 day ago
  • $166k - $244k

    Senior Software Engineer, AI/ML GenAI, Google Cloud Compute Infrastructure Google Sunnyvale, CA, USA Apply Bachelor’s degree or equivalent practical experience. 5 years of experience programming in Python or C++. 3 years of experience with ML infrastructure (e.g.,... 
    Suggested
    Full time

    Google Inc.

    Sunnyvale, CA
    2 days ago
  •  ...Fortanix we are pioneers in confidential computing and Confidential AI for hybrid and multicloud...  ...and data across clouds, on-premises infrastructure, and devices. Our platform enables...  ...protections. We partner closely with leading cloud and silicon providers and bring... 
    Suggested
    H1b

    Fortanix

    Santa Clara, CA
    6 days ago
  • $248k - $391k

     ...NVIDIA has been reinventing computer graphics, PC gaming, and accelerated...  ...into the unlimited potential of AI to define the next era of computing...  ...optimizing the performance of our infrastructure both on-prem and in the cloud. You will lead the architectural vision for a massive... 
    Suggested

    NVIDIA

    Santa Clara, CA
    3 days ago
  • $235.03k - $352.29k

     ...Technical Lead Manager, ML Platform Infrastructure Mountain View, California (HQ) Nuro is a self-driving technology...  ...driver, combining cutting-edge AI with automotive-grade hardware. Nuro...  ...have seamless access to the compute and data resources required to build... 
    Suggested

    Nuro

    Mountain View, CA
    1 day ago
  • $214k - $295k

     ...Staff Software Engineer, Data Infrastructure, AI Compute Platform Redwood City, CA (Hybrid) Biohub is the first large-scale initiative bringing frontier AI models, massive compute, and frontier experimental capabilities under one roof. We're building a general-purpose... 
    Work at office
    Worldwide
    Relocation package
    Flexible hours
    3 days per week

    Biohub

    Redwood City, CA
    1 day ago
  • $115k - $210k

     ...place their items on our kiosks and our AI rings up their entire order in less than...  ...Summary We're looking for a backend infrastructure developer to help us build the software...  ...coding experience ~ B.S. or higher in Computer Science (or equivalent work experience)... 
    Temporary work
    Work experience placement
    Work at office
    Immediate start
    Flexible hours

    Mashgin Inc

    Palo Alto, CA
    3 days ago
  • $140k - $300k

     ...will play a critical role in supporting Tesla's AI hardware initiatives by developing automation, infrastructure, and services. Join a dynamic team of engineers...  ...collaboration with AI HW design teams and High-Performance Computing (HPC) groups. Your primary focus will be... 
    Hourly pay
    Full time
    Temporary work
    Flexible hours

    Tesla

    Palo Alto, CA
    3 days ago
  • $118k - $390k

     ...What to Expect As a Software Engineer within the Autopilot AI Infrastructure team, you will work on reinforcing, optimizing, and scaling...  ...profiling and optimizing CPU-GPU interactions (pipelining computation with data transfers, etc.) Proficient in system-level software... 
    Hourly pay
    Full time
    Temporary work
    Flexible hours

    Tesla

    Palo Alto, CA
    3 days ago
  • $160.36k - $240.54k

     ...Software Engineer, Onboard Infrastructure Mountain View, California (HQ) Nuro is a self-...  ...scalable driver, combining cutting-edge AI with automotive-grade hardware. Nuro licenses...  ...'s onboard software for our sensor and compute platform, including device drivers,... 

    Nuro

    Mountain View, CA
    10 hours ago
  •  ...expertise across connectivity, AI, security and more, we'...  ...our team to own test infrastructure, internal tooling, CI/...  ..., and program leads to define strategy, implement...  ...or Master's degree in Computer Science, Software/...  ...more information on RV Tech's comprehensive benefits... 
    Full time
    Contract work

    Rivian and Volkswagen Group Technologies

    Palo Alto, CA
    1 day ago
  •  ...Senior Software Engineer - Test Infrastructure Latitude AI develops automated driving technologies,...  ...Latitude team, you'll work alongside leading experts across machine learning and robotics...  ...platforms, mapping, sensors and compute systems, test operations, systems and... 
    Work at office
    Immediate start

    Latitude AI

    Palo Alto, CA
    1 day ago
  •  ...global market leader, bringing innovative AI-enhanced technology to over 8,100...  ...better for everyone. ~ Build out core infrastructure services and microservices that impact our...  ...challenging projects quickly. ~ BS in computer science or a related field. ~ High level... 
    Work at office
    Immediate start
    Remote work
    Flexible hours

    ServiceNow

    Mountain View, CA
    3 days ago
  • $152k - $228k

     ...s most scalable driver, combining cutting-edge AI with automotive-grade hardware. Nuro licenses its...  ...for real-time performance on actual robot compute hardware before it reaches the road. You will own the infrastructure that makes this possible. Our Performance Simulation... 
    Temporary work

    Nuro

    Mountain View, CA
    3 days ago
  • $168k - $230k

     ...SR. SECURITY SOFTWARE ENGINEER, APPLIED COMPUTING (STARSHIELD) Starshield leverages SpaceX...  ...Software Engineer, you will leverage AI to automate security-related efforts and...  ...discover and fix security issues in Starshield infrastructure and systems Provide guidance and... 
    Permanent employment
    Temporary work
    Immediate start
    Flexible hours
    Weekend work

    SpaceX

    Palo Alto, CA
    3 days ago
  • $132k - $198k

     ...driver, combining cutting-edge AI with automotive-grade...  ...maintain release and (OTA) update infrastructure. Our team, Fleet connectivity...  ...fleets. Our engineers work on the tech stack across the cloud and...  .... ~ Bachelor's degree in Computer Science, Electrical Engineering... 

    Nuro

    Mountain View, CA
    4 days ago
  • $160.36k - $240.54k

     ...Software Engineer, Offboard Infrastructure Mountain View, California (HQ) Nuro is a self...  ...scalable driver, combining cutting-edge AI with automotive-grade hardware. Nuro licenses...  ...engineering organizations: generic compute platform to host mission-critical workflows... 

    Nuro

    Mountain View, CA
    10 hours ago
  • $174k - $252k

    Senior Software Engineer, AI/ML, AI and Infrastructure Apply X Note: By applying to this position you will have an opportunity to share your preferred...  ...). Preferred qualifications: Master's degree or PhD in Computer Science or related technical field. 5 years of experience... 
    Full time
    Worldwide

    Google Inc.

    Mountain View, CA
    4 days ago
  • $236k - $309.75k

     ...enterprise. To usher in this new era, we seek AI-native thinkers across every function...  ...data. This role focuses on the backend infrastructure that powers our flagship products like...  ...: Education: Bachelor's degree in Computer Science or a related technical field.... 
    Flexible hours

    Snowflake Computing

    Menlo Park, CA
    10 hours ago
  •  ...Backend/Infrastructure Engineer At Simular, we're building the next generation of computer user agents - AI systems that can actually use your computer for you. Our backend powers...  ...it's time to split/refactor services and lead that evolution. Explore new directions... 

    Simular

    Palo Alto, CA
    1 day ago
  • $132.3k - $198.45k

     ...Software Engineer, Distributed Compute System Mountain View, California (HQ) Who...  ...scalable driver, combining cutting-edge AI with automotive-grade hardware. Nuro licenses...  .../scale Nuro's large-scale computing infrastructure in the cloud/data center. This system is... 

    Nuro

    Mountain View, CA
    1 day ago
  • $190k - $220k

     ...Lightmatter is leading the revolution in AI data center infrastructure, enabling the next giant leaps in human progress. The company invented the world's first...  .... Lightmatter is (re)inventing the future of computing with light! In this role, you will lead the development... 
    Full time
    Temporary work
    Flexible hours

    Lightmatter

    Mountain View, CA
    4 days ago
  • $154.3k - $231.5k

     ...centralizing the management of Infrastructure, Technology, and Data. The IT...  ...engineering teams to spin up compute, network, and storage...  .../SLAs for platform services; lead blameless post-mortems and drive...  ...and Accelerating the World's AI Transformation Rubrik (RBRK... 
    Permanent employment
    Local area

    Rubrik

    Palo Alto, CA
    4 days ago
  • $198k - $326k

     ...needs of the team. As a Sr. Staff Software Engineer of the Compute Infrastructure team at LinkedIn, you will play a crucial role in our...  ...container technologies, and systems knowledge. -Experienced in leading technical teams and mentoring other engineers -Experience... 
    For contractors
    Work at office
    Flexible hours

    LinkedIn

    Mountain View, CA
    4 days ago
  • $120k - $300k

     ...accelerates the global adoption of safe, AI-driven machines. Founded in 2017,...  ...every intelligent machine is world-class infrastructure — come help us design it. You will implement...  ...someone who has: A Bachelor's degree in Computer Science, Software Engineering, or equivalent... 
    Full time
    Temporary work
    For contractors
    For subcontractor
    Casual work
    Work at office
    Remote work
    Day shift

    Decisive Point

    Mountain View, CA
    4 days ago
  • $124k - $420k

     ...Engineer for the Optimus team, you will build the tools and infrastructure to make and measure improvements to neural network architecture...  ...: Profiling and optimizing CPU-GPU interactions (pipelining compute/transfers, etc) Compensation and Benefits Benefits... 
    Hourly pay
    Full time
    Temporary work
    Flexible hours

    Tesla

    Palo Alto, CA
    1 day ago
  • $228.4k - $303.55k

     ...and running the world's best data and AI infrastructure platform, so our customers can focus on...  ...~ Leadership skills and experience to lead across functional and organizational lines...  ...-quality solutions ~ MS or Ph.D. in Computer Science or related fields Pay Range... 
    Local area
    Worldwide

    Databricks

    Mountain View, CA
    1 day ago
  •  ...Cloud Infrastructure Engineer At Rhoda AI, we're building the next generation of generalist intelligent robots. We own the full robotics stack from...  ...resolve performance bottlenecks across the data and compute stack to meet latency and throughput requirements Partner... 

    Rhoda ai

    Palo Alto, CA
    2 days ago
  • $180k - $240k

     ...We are seeking a Senior Cloud Infrastructure Engineer to architect and manage the large-scale compute and data infrastructure powering...  ...You will be the backbone of our AI platform, ensuring that multi-...  ...the path to profitable AVs Tech Brew: Gatik AI exec unpacks the... 
    Odd job
    Work at office

    Gatik AI

    Mountain View, CA
    4 days ago
  • $185k - $230k

     ...Senior Software Engineer, Backend (Infrastructure) looking to make a significant...  ...web services and cutting-edge AI technologies? We're seeking a talented engineer to lead the development, deployment,...  ...Bachelor's, Master's, or Ph.D. in Computer Science or a related field.... 
    Permanent employment

    Otter.ai

    Mountain View, CA
    1 day ago

Do you want to receive more vacancies?

Subscribe and receive similar vacancies to Tech Lead, AI Compute Infrastructure. Be the first to apply!