Staff Machine Learning Engineer - ML Training Infrastructure
$185k - $335.3kGeneral Motors Proving Ground
Staff ML Engineer
The Role:
We are seeking an experienced, technically strong, impact-driven expert in ML Training Infrastructure with a demonstrated ability to lead through hands-on technical work. In this role, you will be responsible for defining the technical direction and driving the design and development of scalable, reliable, and high-performance AI/ML platform infrastructure that enables advanced AI research and model development at scale.
As a Staff ML Engineer, you will operate as a technical leader across initiatives, partnering closely with machine learning engineers, research scientists, and platform teams to shape architecture, drive major technical decisions, and deliver state-of-the-art AI infrastructure that enables the future of intelligent driving technologies across General Motors vehicles.
What You'll Do:
- Define and drive the architecture, design, and development of scalable, reliable, and high-performance ML frameworks and platform capabilities to support model training at scale.
- Lead model training performance analysis and optimization efforts across distributed training workflows, improving scalability, efficiency, and cost across heterogeneous hardware environments.
- Raise the bar on system observability, debuggability, operational excellence, and developer experience across the ML training stack.
- Own large, ambiguous, cross-functional technical initiatives from strategy through execution, including technical roadmap definition, tradeoff analysis, and delivery.
- Influence platform direction by identifying long-term infrastructure investments, setting engineering standards, and driving adoption of best practices across teams.
- Collaborate across organizational boundaries to align requirements, resolve technical disagreements, and integrate new capabilities into the platform ecosystem.
- Mentor engineers through design reviews, technical guidance, and hands-on partnership, while elevating engineering quality across the team.
Your Skills & Abilities (Required Qualifications)
- Bachelor's degree or higher in Computer Science or a related field, or equivalent practical experience.
- 7+ years of professional software engineering experience.
- 5+ years of specialized experience in AI/ML infrastructure, such as enabling distributed training for large-scale ML models.
- Strong programming skills in Python, with deep proficiency in frameworks such as PyTorch (preferred), TensorFlow, or similar ML systems.
- Proven experience designing and operating distributed systems for ML training, including distributed computing, GPU computing, and cloud environments (AWS, GCP, Azure).
- Demonstrated track record of leading technically ambiguous, cross-team infrastructure initiatives and driving them to measurable impact.
- Strong architectural judgment and ability to make sound technical tradeoffs across performance, reliability, usability, and cost.
- Willingness to travel to Sunnyvale, CA as needed.
- Comfortable operating in highly ambiguous and dynamic environments.
What Will Give You a Competitive Edge (preferred qualifications):
- Deep expertise in PyTorch 2.x+ and distributed training frameworks.
- Experience designing and developing training platforms that support FSDP, pipeline parallelism, and other scalable solutions for training large foundational models.
- Experience profiling, analyzing, debugging, and optimizing training and data loading performance at scale.
- Strong record of technical leadership through architecture reviews, roadmap influence, and cross-team execution.
- Excellent communication skills, with the ability to build consensus, navigate controversial decisions, communicate risks clearly, and provide constructive technical feedback.
- Self-motivated, execution-oriented, and motivated by delivering broad organizational impact.
Compensation:
- The salary range for this role is $185,000 to $335,300. The actual base salary a successful candidate will be offered within this range will vary based on factors relevant to the position.
- Bonus Potential: An incentive pay program offers payouts based on company performance, job level, and individual performance.
Relocation: This job may be eligible for relocation benefits.
Benefits:
- GM offers a variety of health and wellbeing benefit programs. Benefit options include medical, dental, vision, Health Savings Account, Flexible Spending Accounts, retirement savings plan, sickness and accident benefits, life insurance, paid vacation & holidays, tuition assistance programs, employee assistance program, GM vehicle discounts and more.
Company Vehicle: Upon successful completion of a motor vehicle report review, you will be eligible to participate in a company vehicle evaluation program, through which you will be assigned a General Motors vehicle to drive and evaluate. Note: program participants are required to purchase/lease a qualifying GM vehicle every four years unless one of a limited number of exceptions applies.
This role is categorized as remote. This means the selected candidate may be based anywhere in the country of work and is not expected to report to a GM worksite unless directed by their manager.
$150k
...edge foundation model training, alongside world‑... ...data scientists, and engineers, tackling the most fundamental... ...computing in deep learning, driving impactful... ...looking for a distributed ML infrastructure engineer to help... ...Experience with large‑scale machine learning workloads (...TrainingFlexible hours- ...Machine Learning Infrastructure Engineer At Mind Robotics, we're building generalized physical AI—robotic systems... ...scale models depends on world-class ML infrastructure. We're looking for... ...fast, reliable, and scalable model training—powering everything from...Training
- ...About the role We're looking for seasoned ML Infrastructure engineers with experience designing, building and maintaining training and serving infrastructure for ML research.... ...Character.AI empowers people to connect, learn and tell stories through interactive...Training
- ...Machine Learning Engineer In ML Runtime & Optimization Zensors is the spatial intelligence platform... ...edge compute resources. The AI Infrastructure team at Zensors builds the engine that... ...technologies to accelerate the training and inference of computer vision models...TrainingWork at office
- ...OpenAI for Physics | 5 Days Onsite Machine Learning Infrastructure Engineer Location: Onsite in San... ...generation + simulation orchestration + training/fine-tuning infrastructure + benchmarking... .../operating infrastructure for ML/compute-heavy workflows: pipelines,...TrainingWork at officeFlexible hours1 day per week
$183.7k - $248.6k
...The opportunity Unity is looking for a Senior Machine Learning Infrastructure Engineer to join our Vector Ads team, where we build the real... ...build and operate the infrastructure that brings ML models from training into production, ensuring our ranking, bidding, and...TrainingWork at officeRemote workWorldwideRelocation package$170k - $240k
...oriented, impact delivering-driven expert in ML Training Infrastructure with a strong ability to execute hands-on technical... ...development initiatives. As a Senior ML Engineer, you will collaborate closely with machine learning engineers, research scientists, and other...TrainingLocal areaRemote workWork from homeRelocationRelocation packageFlexible hours$92k - $138k
...Vector builds an offline ML platform that powers... ..., product intelligence, machine learning pipelines, and business... ...enables large-scale model training, feature generation, and... ...for a Machine Learning Engineer to join our Offline Infrastructure team. This is an ideal role...TrainingWork at officeWorldwideRelocation package- ...Location Type On-site Department Software Engineering We’re hiring Machine Learning Infrastructure Engineers to build the systems that make large-scale model training actually work. This role is for... ...—owning distributed training, core ML infrastructure, and fast iteration...TrainingFull time
- ...consists of pioneers in robotics and machine learning. We are now hiring to scale our R&... ...a Reinforcement/Machine Learning Infrastructure Engineer to shape our training infrastructure. In this role, you... ...a proven track record of building ML training infrastructure, internal...TrainingImmediate start
- ...ML Infrastructure Engineer, Model Inference As an ML Infrastructure Engineer, Model Inference at... ...inference infrastructure that powers our machine learning models. Your work will be... ...clusters for AI model inference and training Develop, optimize, and maintain ML...TrainingHourly payFull timeRemote workFlexible hours
- ...Optasia growth plan and the ML Engineering team is a significant... ...are looking for a Senior ML Infrastructure Engineer to join our growing... ...statistical and machine learning algorithms, and (iii) operationalizing... ...ready for you Continuous training and access to online...TrainingRemote workFlexible hours
$185k - $335.3k
...technically strong, impact-driven expert in ML Training Infrastructure with a demonstrated ability to... ...development at scale. As a Staff ML Engineer, you will operate as a technical... ...initiatives, partnering closely with machine learning engineers, research scientists,...TrainingLocal areaRemote workWork from homeRelocationRelocation packageFlexible hours$209.7k - $283.8k
...Bellevue, WA, USA Staff Machine Learning Engineer, ML Infrastructure Location Bellevue, WA, USA Department AI & Machine Learning Requisition... ...grow, our platform also supports large-scale model training, feature generation, and experimentation workflows...TrainingWork at officeWorldwideRelocation package$180k - $300k
...constantly evolving our firm’s IT infrastructure and engineering capabilities, positioning... ...-scale generative AI and machine learning workloads, enabling faster... ...systems for model training, hyperparameter tuning, inference... ...-to-end machine learning (ML) workflows Collaborate...TrainingWork experience placement- Staff Machine Learning Engineer, Listings and Host Tools Data and AI Airbnb was born... ...Intelligence Machine Learning (ULM-ML) team: The ULM-ML team... ...best practices (eg. training/serving skew minimization,... ...end-to-end Machine Learning infrastructure and/or building and...TrainingWork experience placement
- ...us to shape the next frontier of AI-driven robotics! Learn more at dyna.co Position Overview: We are seeking an experienced Machine Learning Infrastructure Engineer to join our team and help scale our ML training platform. In this role, you will be responsible for designing...TrainingLocal area
$150k - $195k
...Machine Learning Solutions Engineer (ML + Infrastructure Focus) New York, New York, United States; San Francisco, California, United States; Seattle, Washington... ...9, we build an end-to-end platform for developing, training, and deploying AI systems—designed to take ideas...TrainingWork at officeLocal area2 days per week$172.5k - $306.63k
...We’re looking for a Senior Machine Learning Engineer to join our Applied Science Data... ...this role, you’ll build the infrastructure that powers large-scale, multimodal AI training and inference. You’ll work... ...operating distributed systems or ML infrastructure in...TrainingTemporary workLocal areaWorldwide- ...Staff AI/ML Engineer General Motors is seeking a Staff AI/ML Engineer for... ..., feature engineering, training, evaluation, and inference... ...and anomaly detection, deep learning where appropriate) with a focus... ...Science, Data Science, Machine Learning, Statistics, Engineering...TrainingRemote workRelocationRelocation package
$250k - $350k
...one builds what makes them actually work. We're hiring ML Infrastructure Engineers to tackle a hard, real-world problem, understanding what's... ...throughput video pipelines handling millions of hours of data Training and inference systems for multimodal / LLM-based models...Training- ...ML Infrastructure Engineer Spectral Labs is a spatial intelligence company building reasoning models for engineering physical systems. Our... ...models better. Responsibilities Optimize distributed training & RL across our GPU cluster of hundreds of H100 GPUs (FSDP...Training
- ...leading a new era in cloud infrastructure for the global AI... ...enterprises from data and model training through to production... ...large in-house AI/ML infrastructure. Built by engineers, for engineers. From large... ...of GPU platforms for machine learning and AI workloads. You...TrainingRemote work
- ...Senior Machine Learning Infrastructure Engineer Echo Neurotechnologies is an exciting new startup in the Brain... ...high-performance, production-grade ML ecosystem to support rapid experimentation... ...and optimize scalable distributed training pipelines, with support for features...TrainingFlexible hours
$153.2k - $234.1k
...developing and deploying machine learning solutions that support safe... ...scenarios. As a Senior ML Infra Engineer, you will work on the core... ...rapid dataset generation, training, evaluation and iteration... ...systems, applications, or ML infrastructure. ~ Experience designing...TrainingLocal areaRemote workWork from homeRelocation packageFlexible hours$320k - $405k
...committed researchers, engineers, policy experts, and... ...We are seeking a Machine Learning Infrastructure Engineer to join our... ...design and implement ML infrastructure that powers... ...of education, training, and/or experience... ...Currently, we expect all staff to be in one of our offices...TrainingWork at officeVisa sponsorshipFlexible hours$153.2k - $234.1k
...Motors, where we build the critical infrastructure that powers every machine learning engineer working on our cutting-edge... ...driverless vehicles. As a Senior ML Infra Engineer, you will build critical... ...drive machine learning model training and evaluation workflows across...TrainingWork at officeLocal areaRemote workWork from homeRelocationRelocation packageFlexible hours- ...We're looking for an experienced HPC infrastructure engineer to lead bringup, administration, and operations... ...on is probably the largest anime AI training cluster in the world . You'll serve... ...our researchers and the bare GPU machines, helping to make sure that SLURM jobs...TrainingWork at officeVisa sponsorship
- Mind Robotics Inc. in Palo Alto is seeking a Machine Learning Infrastructure Engineer to design and implement scalable systems for training large ML models. The role involves developing and optimizing distributed training systems and improving training efficiencies through...Training
$150k
A leading research lab in Sunnyvale is seeking a distributed ML infrastructure engineer to extend and scale training systems. The ideal candidate must have over 5 years of experience in ML systems with strong expertise in distributed training frameworks like DeepSpeed and...Training
Do you want to receive more vacancies?
Subscribe and receive similar vacancies to Staff Machine Learning Engineer - ML Training Infrastructure. Be the first to apply!
- staff automation engineer United States
- assistant field engineer United States
- staff data engineer United States
- assistant building engineer United States
- research assistant engineering United States
- assistant engineer United States
- staff devops engineer United States
- staff engineer United States
- assistant electrical engineer United States
- information technology support assistant United States


