Sign up to access all features of our service.
  • Job search
  • Favorites
  • Create a CV
    New
  • Salaries
  • Subscriptions

Staff Machine Learning Engineer - ML Training Infrastructure

$185k - $335.3k

General Motors Proving Ground

Staff ML Engineer

The Role:

We are seeking an experienced, technically strong, impact-driven expert in ML Training Infrastructure with a demonstrated ability to lead through hands-on technical work. In this role, you will be responsible for defining the technical direction and driving the design and development of scalable, reliable, and high-performance AI/ML platform infrastructure that enables advanced AI research and model development at scale.

As a Staff ML Engineer, you will operate as a technical leader across initiatives, partnering closely with machine learning engineers, research scientists, and platform teams to shape architecture, drive major technical decisions, and deliver state-of-the-art AI infrastructure that enables the future of intelligent driving technologies across General Motors vehicles.

What You'll Do:

  • Define and drive the architecture, design, and development of scalable, reliable, and high-performance ML frameworks and platform capabilities to support model training at scale.
  • Lead model training performance analysis and optimization efforts across distributed training workflows, improving scalability, efficiency, and cost across heterogeneous hardware environments.
  • Raise the bar on system observability, debuggability, operational excellence, and developer experience across the ML training stack.
  • Own large, ambiguous, cross-functional technical initiatives from strategy through execution, including technical roadmap definition, tradeoff analysis, and delivery.
  • Influence platform direction by identifying long-term infrastructure investments, setting engineering standards, and driving adoption of best practices across teams.
  • Collaborate across organizational boundaries to align requirements, resolve technical disagreements, and integrate new capabilities into the platform ecosystem.
  • Mentor engineers through design reviews, technical guidance, and hands-on partnership, while elevating engineering quality across the team.

Your Skills & Abilities (Required Qualifications)

  • Bachelor's degree or higher in Computer Science or a related field, or equivalent practical experience.
  • 7+ years of professional software engineering experience.
  • 5+ years of specialized experience in AI/ML infrastructure, such as enabling distributed training for large-scale ML models.
  • Strong programming skills in Python, with deep proficiency in frameworks such as PyTorch (preferred), TensorFlow, or similar ML systems.
  • Proven experience designing and operating distributed systems for ML training, including distributed computing, GPU computing, and cloud environments (AWS, GCP, Azure).
  • Demonstrated track record of leading technically ambiguous, cross-team infrastructure initiatives and driving them to measurable impact.
  • Strong architectural judgment and ability to make sound technical tradeoffs across performance, reliability, usability, and cost.
  • Willingness to travel to Sunnyvale, CA as needed.
  • Comfortable operating in highly ambiguous and dynamic environments.

What Will Give You a Competitive Edge (preferred qualifications):

  • Deep expertise in PyTorch 2.x+ and distributed training frameworks.
  • Experience designing and developing training platforms that support FSDP, pipeline parallelism, and other scalable solutions for training large foundational models.
  • Experience profiling, analyzing, debugging, and optimizing training and data loading performance at scale.
  • Strong record of technical leadership through architecture reviews, roadmap influence, and cross-team execution.
  • Excellent communication skills, with the ability to build consensus, navigate controversial decisions, communicate risks clearly, and provide constructive technical feedback.
  • Self-motivated, execution-oriented, and motivated by delivering broad organizational impact.

Compensation:

  • The salary range for this role is $185,000 to $335,300. The actual base salary a successful candidate will be offered within this range will vary based on factors relevant to the position.
  • Bonus Potential: An incentive pay program offers payouts based on company performance, job level, and individual performance.

Relocation: This job may be eligible for relocation benefits.

Benefits:

  • GM offers a variety of health and wellbeing benefit programs. Benefit options include medical, dental, vision, Health Savings Account, Flexible Spending Accounts, retirement savings plan, sickness and accident benefits, life insurance, paid vacation & holidays, tuition assistance programs, employee assistance program, GM vehicle discounts and more.

Company Vehicle: Upon successful completion of a motor vehicle report review, you will be eligible to participate in a company vehicle evaluation program, through which you will be assigned a General Motors vehicle to drive and evaluate. Note: program participants are required to purchase/lease a qualifying GM vehicle every four years unless one of a limited number of exceptions applies.

This role is categorized as remote. This means the selected candidate may be based anywhere in the country of work and is not expected to report to a GM worksite unless directed by their manager.

Vacancy posted 3 days ago
Similar jobs that could be interesting for youBased on the Staff Machine Learning Engineer - ML Training Infrastructure in United States vacancy
  • $150k

     ...edge foundation model training, alongside world‑...  ...data scientists, and engineers, tackling the most fundamental...  ...computing in deep learning, driving impactful...  ...looking for a distributed ML infrastructure engineer to help...  ...Experience with large‑scale machine learning workloads (... 
    Training
    Flexible hours

    Institute of Foundation Models

    Sunnyvale, CA
    6 hours ago
  •  ...Machine Learning Infrastructure Engineer At Mind Robotics, we're building generalized physical AI—robotic systems...  ...scale models depends on world-class ML infrastructure. We're looking for...  ...fast, reliable, and scalable model training—powering everything from... 
    Training

    Mind Robotics

    Palo Alto, CA
    1 day ago
  •  ...About the role We're looking for seasoned ML Infrastructure engineers with experience designing, building and maintaining training and serving infrastructure for ML research....  ...Character.AI empowers people to connect, learn and tell stories through interactive... 
    Training

    Character

    Redwood City, CA
    5 days ago
  •  ...Machine Learning Engineer In ML Runtime & Optimization Zensors is the spatial intelligence platform...  ...edge compute resources. The AI Infrastructure team at Zensors builds the engine that...  ...technologies to accelerate the training and inference of computer vision models... 
    Training
    Work at office

    Zensors

    San Francisco, CA
    4 days ago
  •  ...OpenAI for Physics | 5 Days Onsite Machine Learning Infrastructure Engineer Location: Onsite in San...  ...generation + simulation orchestration + training/fine-tuning infrastructure + benchmarking...  .../operating infrastructure for ML/compute-heavy workflows: pipelines,... 
    Training
    Work at office
    Flexible hours
    1 day per week

    UniversalAGI

    San Francisco, CA
    1 day ago
  • $183.7k - $248.6k

     ...The opportunity Unity is looking for a Senior Machine Learning Infrastructure Engineer to join our Vector Ads team, where we build the real...  ...build and operate the infrastructure that brings ML models from training into production, ensuring our ranking, bidding, and... 
    Training
    Work at office
    Remote work
    Worldwide
    Relocation package

    UNITY

    San Francisco, CA
    1 day ago
  • $170k - $240k

     ...oriented, impact delivering-driven expert in ML Training Infrastructure with a strong ability to execute hands-on technical...  ...development initiatives. As a Senior ML Engineer, you will collaborate closely with machine learning engineers, research scientists, and other... 
    Training
    Local area
    Remote work
    Work from home
    Relocation
    Relocation package
    Flexible hours

    General Motors

    Carson City, NV
    5 days ago
  • $92k - $138k

     ...Vector builds an offline ML platform that powers...  ..., product intelligence, machine learning pipelines, and business...  ...enables large-scale model training, feature generation, and...  ...for a Machine Learning Engineer to join our Offline Infrastructure team. This is an ideal role... 
    Training
    Work at office
    Worldwide
    Relocation package

    Unity

    Bellevue, WA
    5 days ago
  •  ...Location Type On-site Department Software Engineering We’re hiring Machine Learning Infrastructure Engineers to build the systems that make large-scale model training actually work. This role is for...  ...—owning distributed training, core ML infrastructure, and fast iteration... 
    Training
    Full time

    Garuda Ventures

    Palo Alto, CA
    5 days ago
  •  ...consists of pioneers in robotics and machine learning. We are now hiring to scale our R&...  ...a Reinforcement/Machine Learning Infrastructure Engineer to shape our training infrastructure. In this role, you...  ...a proven track record of building ML training infrastructure, internal... 
    Training
    Immediate start

    ekarobotics

    Boston, MA
    2 days ago
  •  ...ML Infrastructure Engineer, Model Inference As an ML Infrastructure Engineer, Model Inference at...  ...inference infrastructure that powers our machine learning models. Your work will be...  ...clusters for AI model inference and training Develop, optimize, and maintain ML... 
    Training
    Hourly pay
    Full time
    Remote work
    Flexible hours

    Abridge

    United States
    1 day ago
  •  ...Optasia growth plan and the ML Engineering team is a significant...  ...are looking for a Senior ML Infrastructure Engineer to join our growing...  ...statistical and machine learning algorithms, and (iii) operationalizing...  ...ready for you Continuous training and access to online... 
    Training
    Remote work
    Flexible hours

    Optasia Group

    Brooklyn, NY
    2 days ago
  • $185k - $335.3k

     ...technically strong, impact-driven expert in ML Training Infrastructure with a demonstrated ability to...  ...development at scale. As a Staff ML Engineer, you will operate as a technical...  ...initiatives, partnering closely with machine learning engineers, research scientists,... 
    Training
    Local area
    Remote work
    Work from home
    Relocation
    Relocation package
    Flexible hours

    General Motors

    Pierre, SD
    8 days ago
  • $209.7k - $283.8k

     ...Bellevue, WA, USA Staff Machine Learning Engineer, ML Infrastructure Location Bellevue, WA, USA Department AI & Machine Learning Requisition...  ...grow, our platform also supports large-scale model training, feature generation, and experimentation workflows... 
    Training
    Work at office
    Worldwide
    Relocation package

    Unity Technologies

    Bellevue, WA
    2 days ago
  • $180k - $300k

     ...constantly evolving our firm’s IT infrastructure and engineering capabilities, positioning...  ...-scale generative AI and machine learning workloads, enabling faster...  ...systems for model training, hyperparameter tuning, inference...  ...-to-end machine learning (ML) workflows Collaborate... 
    Training
    Work experience placement

    Point72 Asset Management, L.P

    New York, NY
    3 days ago
  • Staff Machine Learning Engineer, Listings and Host Tools Data and AI Airbnb was born...  ...Intelligence Machine Learning (ULM-ML) team: The ULM-ML team...  ...best practices (eg. training/serving skew minimization,...  ...end-to-end Machine Learning infrastructure and/or building and... 
    Training
    Work experience placement

    airbnb, Inc.

    San Francisco, CA
    4 days ago
  •  ...us to shape the next frontier of AI-driven robotics! Learn more at dyna.co Position Overview: We are seeking an experienced Machine Learning Infrastructure Engineer to join our team and help scale our ML training platform. In this role, you will be responsible for designing... 
    Training
    Local area

    Dyna Robotics

    Redwood City, CA
    4 days ago
  • $150k - $195k

     ...Machine Learning Solutions Engineer (ML + Infrastructure Focus) New York, New York, United States; San Francisco, California, United States; Seattle, Washington...  ...9, we build an end-to-end platform for developing, training, and deploying AI systems—designed to take ideas... 
    Training
    Work at office
    Local area
    2 days per week

    Lightning AI

    San Francisco, CA
    2 days ago
  • $172.5k - $306.63k

     ...We’re looking for a Senior Machine Learning Engineer to join our Applied Science Data...  ...this role, you’ll build the infrastructure that powers large-scale, multimodal AI training and inference. You’ll work...  ...operating distributed systems or ML infrastructure in... 
    Training
    Temporary work
    Local area
    Worldwide

    Adobe

    San Jose, CA
    3 days ago
  •  ...Staff AI/ML Engineer General Motors is seeking a Staff AI/ML Engineer for...  ..., feature engineering, training, evaluation, and inference...  ...and anomaly detection, deep learning where appropriate) with a focus...  ...Science, Data Science, Machine Learning, Statistics, Engineering... 
    Training
    Remote work
    Relocation
    Relocation package

    General Motors

    United States
    2 days ago
  • $250k - $350k

     ...one builds what makes them actually work. We're hiring ML Infrastructure Engineers to tackle a hard, real-world problem, understanding what's...  ...throughput video pipelines handling millions of hours of data Training and inference systems for multimodal / LLM-based models... 
    Training

    techire ai

    San Francisco, CA
    3 days ago
  •  ...ML Infrastructure Engineer Spectral Labs is a spatial intelligence company building reasoning models for engineering physical systems. Our...  ...models better. Responsibilities Optimize distributed training & RL across our GPU cluster of hundreds of H100 GPUs (FSDP... 
    Training

    Spectral Labs

    San Francisco, CA
    14 days ago
  •  ...leading a new era in cloud infrastructure for the global AI...  ...enterprises from data and model training through to production...  ...large in-house AI/ML infrastructure. Built by engineers, for engineers. From large...  ...of GPU platforms for machine learning and AI workloads. You... 
    Training
    Remote work

    Nebius

    United States
    4 days ago
  •  ...Senior Machine Learning Infrastructure Engineer Echo Neurotechnologies is an exciting new startup in the Brain...  ...high-performance, production-grade ML ecosystem to support rapid experimentation...  ...and optimize scalable distributed training pipelines, with support for features... 
    Training
    Flexible hours

    Echo Neurotechnologies

    United States
    3 days ago
  • $153.2k - $234.1k

     ...developing and deploying machine learning solutions that support safe...  ...scenarios. As a Senior ML Infra Engineer, you will work on the core...  ...rapid dataset generation, training, evaluation and iteration...  ...systems, applications, or ML infrastructure. ~ Experience designing... 
    Training
    Local area
    Remote work
    Work from home
    Relocation package
    Flexible hours

    General Motors

    Sunnyvale, CA
    2 days ago
  • $320k - $405k

     ...committed researchers, engineers, policy experts, and...  ...We are seeking a Machine Learning Infrastructure Engineer to join our...  ...design and implement ML infrastructure that powers...  ...of education, training, and/or experience...  ...Currently, we expect all staff to be in one of our offices... 
    Training
    Work at office
    Visa sponsorship
    Flexible hours

    Anthropic

    San Francisco, CA
    3 days ago
  • $153.2k - $234.1k

     ...Motors, where we build the critical infrastructure that powers every machine learning engineer working on our cutting-edge...  ...driverless vehicles. As a Senior ML Infra Engineer, you will build critical...  ...drive machine learning model training and evaluation workflows across... 
    Training
    Work at office
    Local area
    Remote work
    Work from home
    Relocation
    Relocation package
    Flexible hours

    General Motors

    Sunnyvale, CA
    1 day ago
  •  ...We're looking for an experienced HPC infrastructure engineer to lead bringup, administration, and operations...  ...on is probably the largest anime AI training cluster in the world . You'll serve...  ...our researchers and the bare GPU machines, helping to make sure that SLURM jobs... 
    Training
    Work at office
    Visa sponsorship

    Spellbrush

    San Francisco, CA
    4 days ago
  • Mind Robotics Inc. in Palo Alto is seeking a Machine Learning Infrastructure Engineer to design and implement scalable systems for training large ML models. The role involves developing and optimizing distributed training systems and improving training efficiencies through... 
    Training

    Mind Robotics Inc.

    Palo Alto, CA
    4 days ago
  • $150k

    A leading research lab in Sunnyvale is seeking a distributed ML infrastructure engineer to extend and scale training systems. The ideal candidate must have over 5 years of experience in ML systems with strong expertise in distributed training frameworks like DeepSpeed and... 
    Training

    Institute of Foundation Models

    Sunnyvale, CA
    4 days ago

Do you want to receive more vacancies?

Subscribe and receive similar vacancies to Staff Machine Learning Engineer - ML Training Infrastructure. Be the first to apply!