Principal ML Engineer: Large-Scale Training Performance
Advanced Micro Devices , Inc.
A leading technology firm is seeking a Principal Machine Learning Engineer in San Jose, CA. The role focuses on optimizing distributed training for large models, making significant contributions to AMD's AI platform. The ideal candidate should have expertise in distributed training algorithms and be proficient in frameworks like PyTorch or TensorFlow. This position offers a collaborative environment dedicated to innovation. #J-18808-Ljbffr
- ...beyond. Together, we advance your career. PMTS Large Scale Training Performance Optimization ENGINEER THE ROLE: We are looking for a Principal Machine Learning Engineer to join our... ...stakeholders. PREFERRED EXPERIENCE: Experience with ML/DL frameworks such as PyTorch, JAX, or...TrainingPerformance
- .... Fellow Machine Learning Engineer to join the Training At Scale team in San Jose, CA. The... ...on distributed training of large models and improve training... ...enhancing pipeline performance, contributing to open source... ...learning and experience with ML frameworks like PyTorch and...TrainingPerformance
$224k - $356.5k
...Machine Learning and Simulation Engineers for their Autonomous... ...design and development of large-scale RL training frameworks to enhance multi... ...on simulation accuracy and performance. The ideal candidate has over... ...12 years of experience in ML training, simulating AV systems...TrainingPerformance$189k - $300k
...transportation on a global scale. The Data Scaling... ...on and delivers ML models to the... ...AV product performance through smart use... ...uses existing very large datasets that GM has... ...foundation model pre‑training and fine‑tuning with... ...team of AI/ML engineers, data scientists and...TrainingPerformanceLocal areaRemote workRelocationRelocation packageFlexible hours$153.2k - $234.1k
...transportation on a global scale. Role Overview:... ...machine learning engineer working on our... ...the safety and performance of the car, rather... .... As a Senior ML Infra Engineer, you... ...learning model training and evaluation workflows... ...building large-scale distributed...TrainingPerformanceWork at officeLocal areaRemote workWork from homeRelocationRelocation packageFlexible hours$275.8k - $340.5k
...the team: The AV ML Infra team at GM builds... ...productivity of ML engineers, and drive the... ...Ensures robust model performance by running large-scale simulation workloads... ...andoptimizeslarge-scale ML training and inference across... ...Overview: The Principal AI/ML Engineer will...TrainingPerformanceLocal areaRemote workWork from homeRelocationRelocation packageFlexible hours$296.3k
...the team: The AV ML Infra team at GM builds... ...productivity of ML engineers, and drive the... ...Ensures robust model performance by running large-scale simulation workloads... ...andoptimizeslarge-scale ML training and inference across... ...Overview: The Principal AI/ML Engineer will...TrainingPerformanceLocal areaWork from homeFlexible hours$296.3k
...The Role: We are seeking a Principal AI Engineer to lead the design and... ...infrastructure that powers large-scale training and cloud inference. This... ...build, and optimize core AI/ML platform infrastructure to... ...reliability, scalability, and performance across the AI/ML platform....TrainingPerformanceRemote workFlexible hours$291.5k - $369.1k
...expertise with the scale and operational... ...and Cisco's global engineering capabilities. Our... ...following areas: large language modeling... ...~ LargeScale Training & Optimization - Experience... ...production monitoring of ML models. ~... ...sales plans earn performance-based incentive...TrainingPerformanceFull timeTemporary workLocal areaFlexible hours$206.4k - $384.68k
...are hiring a Director, ML Engineering to own the... ...services at enterprise scale. This is a multi-faceted... ...inference quality matches the training environment, and to... ...visions into reliable, high-performance ML systems that... ...engineering reality of large-scale inference: accelerator...TrainingPerformanceTemporary workLocal areaWorldwide$157.2k - $254.1k
...Machine Learning Engineer We are seeking... ...using generative AI, large language models (... ...systems to automate and scale our detection and... ...building, training, and deploying machine... ...record of taking ML projects from initial... ...Go, or Rust) for performance-critical...TrainingPerformance$291.5k - $369.1k
...expertise with the scale and operational... ...and Cisco’s global engineering capabilities. Our... ...and deployment for large‑scale foundation models... .... Large‑Scale Training & Optimization –... ...monitoring of ML models. Strong... ...sales plans earn performance-based incentive pay...TrainingPerformanceFull timeTemporary workLocal areaFlexible hours- ...company in Sunnyvale is seeking a Staff Software Engineer to drive AI/ML performance. The successful candidate will handle large-scale system design and optimization challenges,... ...teams to ensure peak performance of ML training and serving workflows, directly impacting...TrainingPerformance
- ...Scientist specializing in privacy-preserving model training and architecture optimization in San Jose. The candidate will design and optimize large-scale training architectures for advanced generative models and lead performance optimization efforts across GPUs. This role...TrainingPerformance
$180k
...While today's AI largely operates through chat... ...and manage large-scale GPU computing... ...clusters powering our AI training and deployment... ...intersection of systems engineering and machine... ...Partner closely with ML researchers and... ...level tooling and performance-critical services....TrainingPerformanceFull time$228.1k - $393.8k
...Machine Learning Engineering Manager – Ads Predictions... ...who has built and scaled complex machine... ...and deploying large-scale models, with... ...from data and model training pipelines to real-... ...are a hands-on ML leader who can drive... ...Lead and grow a high-performing, team of ML...TrainingPerformanceRelocation$272k - $431.25k
...is seeking a Senior MLOps Engineering Manager to join our... ...development, and operation of large‑scale, end‑to‑end data and ML pipelines that power NVIDIA... ...radar—into high‑quality training, evaluation, and... ...Doing Lead and grow a high‑performing MLOps engineering group tasked...TrainingPerformance$181.1k - $318.4k
...Sr. ML Engineer, Siri User Experience Metrics and Data... ...identify regressions in Siri performance and alert engineering... ..., building and training ML models using distributed... .... Experience applying large language models (LLMs)... ..., and large‑scale operations, including...TrainingPerformanceRelocation$144.7k - $261.3k
..., cloud infrastructure, and ML/AI GPU platforms for AV research... ...GM is looking for a Senior Performance Engineer to join the AV Capacity and... ...is to provide input into large scale ML infrastructure strategy,... ...reliability of large-scale ML training and inference environments....TrainingPerformanceLocal areaRemote workWork from homeFlexible hours3 days per week- ...ML Engineer / Generalist HypeLab is a small, profitable... ...at real marketplace scale. We process more than... ...strong service, and better performance matter. Our customers... ...not be tucked away training models with no connection... ...and do not need a large team around you to make...TrainingPerformance
- ...Staff/Sr. ML Compute Efficiency Engineer Scaling machine learning workloads across thousands of GPUs and... ...the infrastructure that powers large-scale ML training and inference workloads, bringing... ...infrastructure, and high-performance computing. As a performance engineer...TrainingPerformance
- ...to optimize resource usage for training and fine-tuning models, ensuring high performance while maintaining efficiency.... ...of operational expenses through large-scale adoption. Responsibilities... ...functional teams, including product, engineering, and business stakeholders, to...TrainingPerformance
$156k - $316.8k
...Research Scientist — Privacy-Preserving Large-Scale Model Training & Architecture Optimization Location:... ...diffusion systems). Lead GPU-centric performance optimization, including memory layout,... ...Qualifications: Experience with privacy-preserving ML, sensitive data training, or regulated...TrainingPerformanceTemporary workLocal area$189.3k - $320.7k
...Design and implement ML solutions aligned... ...as unsupervised pre‑training, supervised learning, model scaling/selection, and... ...systematic use of GM’s large‑scale datasets, utilizing... ...Support and mentor engineers through technical... ...linked to company performance, job level, and...TrainingPerformanceLocal areaRemote workFlexible hours$170.7k - $300.2k
...Group is looking for engineers to work on developing... ...with experts in high-performance computing, machine learning... ...proficiency in ML modeling frameworks, experience... ...and integrating large-scale distributed machine learning... ..., improving GPU training approaches, developing...TrainingPerformance$147k - $211k
Google Inc. is seeking a skilled ML Compiler Software Engineer for its Sunnyvale office. The position requires a Bachelor's degree, proficiency... ...and collaborating with cross-functional teams to maximize performance. The US base salary for this full-time role is between $1...PerformanceFull timeWork at office$272k - $431.25k
...Principal Deep Learning Senior Engineer, End-To-End Autonomous Driving page is loaded##... ...Doing:*** Design and train innovative large-scale models—including generative... ...environments, ensuring performance, safety, and... ...deploying production-grade ML models for self-driving...TrainingPerformanceWork experience placement$181.1k - $272.1k
...ML Infrastructure Engineer - Multimodal Training Tools, SIML Work Locations (2) Submit Resume Are you passionate... ..., adapting and deploying large-scale generative models. You will be working... ...training and inference performance Integrating efficient data loading...TrainingPerformanceRelocation$272k - $431.25k
...establish best practices for training and evaluation, using techniques such as large-scale pretraining,... ...to quantify perception performance; analyze large-scale real... ...mentorship to other engineers, influencing design and... ...leadership as a senior or principal‑level individual...TrainingPerformance- ...GPUs. Our novel wafer-scale architecture provides... ...industry-leading training and inference speeds... ...to effortlessly run large-scale ML applications, without... ...The Inference ML Engineering team at Cerebras Systems... ...platform, leveraging its performance, scalability, and flexibility...TrainingPerformance
Do you want to receive more vacancies?
Subscribe and receive similar vacancies to Principal ML Engineer: Large-Scale Training Performance. Be the first to apply!
- senior civil engineer project manager San Jose, CA
- senior chief engineer San Jose, CA
- director of product engineering San Jose, CA
- engineering director San Jose, CA
- chief engineer San Jose, CA
- chief design engineer San Jose, CA
- principal network engineer San Jose, CA
- data center chief engineer San Jose, CA
- principal infrastructure engineer San Jose, CA
- hotel chief engineer San Jose, CA

