Principal ML Engineer: Large-Scale Training Performance
Advanced Micro Devices
A leading technology firm is seeking a Principal Machine Learning Engineer in San Jose, CA. The role focuses on optimizing distributed training for large models, making significant contributions to AMD's AI platform. The ideal candidate should have expertise in distributed training algorithms and be proficient in frameworks like PyTorch or TensorFlow. This position offers a collaborative environment dedicated to innovation. #J-18808-Ljbffr Advanced Micro Devices
- ...are looking for a Principal Machine Learning Engineer to join our Models... ...challenge of distributed training of large models on a large... ...generative AI at scale. THE PERSON:... ...training pipeline performance. Optimize the distributed... ...Experience with ML/DL frameworks such...TrainingPerformance
$136.5k - $253.5k
...skyrocketed due to increasing performance demands from AI. We are a... ...team of software developers, ML scientists, and research-minded engineers on a mission to change that.... ...learning engineers experienced in training large language models at scale, as well as accomplished...TrainingPerformance$136.5k - $253.5k
...skyrocketed due to increasing performance demands from AI. We are a... ...team of software developers, ML scientists, and research-minded engineers on a mission to change that.... ...learning engineers experienced in training large language models at scale, as well as accomplished...TrainingPerformance$220k - $245k
...natural advantages for scale: photonsdon'tfeel... ...interrelated areas: training and deploying ML models that operate directly... ...in exploring large, complex design spaces... ...training pipelines for high-performance model development.... ...with physicists and engineers to translate quantum...TrainingPerformanceFull timeShift work$159.3k - $230.7k
...transportation on a global scale. The Data... ...on and delivers ML models to the... ...impacting AV product performance through smart use... ...uses existing very large datasets that GM has... ...model pre-training and fine-tuning with... ...impact team of AI/ML engineers, data scientists and...TrainingPerformanceLocal areaRemote workWork from homeRelocation packageFlexible hours- .... Fellow Machine Learning Engineer to join the Training At Scale team in San Jose, CA. The... ...on distributed training of large models and improve training... ...enhancing pipeline performance, contributing to open source... ...learning and experience with ML frameworks like PyTorch and...TrainingPerformance
$153.2k - $234.1k
...transportation on a global scale. Role Overview:... ...machine learning engineer working on our... ...the safety and performance of the car, rather... .... As a Senior ML Infra Engineer, you... ...learning model training and evaluation workflows... ...building large-scale distributed...TrainingPerformanceWork at officeLocal areaRemote workWork from homeRelocationRelocation packageFlexible hours$275.8k - $340.5k
...the team: The AV ML Infra team at GM builds... ...productivity of ML engineers, and drive the... ...Ensures robust model performance by running large-scale simulation workloads... ...andoptimizeslarge-scale ML training and inference across... ...Overview: The Principal AI/ML Engineer will...TrainingPerformanceLocal areaRemote workWork from homeRelocationRelocation packageFlexible hours$296.3k
...the team: The AV ML Infra team at GM builds... ...productivity of ML engineers, and drive the... ...Ensures robust model performance by running large-scale simulation workloads... ...andoptimizeslarge-scale ML training and inference across... ...Overview: The Principal AI/ML Engineer will...TrainingPerformanceLocal areaWork from homeFlexible hours$272k - $431.25k
...looking for a Machine Learning (ML) Engineer to join the GPU accelerated... ...engine in data centers for running large scale workloads for ETL, SQL, and ML/DL model training and inference pipelines,... ...machine learning solutions for performance prediction and optimization of...TrainingPerformance$157.2k - $254.1k
...Machine Learning Engineer to join our pioneering... ...generative AI, large language models (LLMs... ...to automate and scale our detection and... ...experience building, training, and deploying... ...record of taking ML projects from initial... ...Go, or Rust) for performance-critical components...TrainingPerformanceFull timeWork at office$296.3k
...Role: We are seeking a Principal AI Engineer to lead the design and advancement... ...that powers large-scale training and cloud inference. This... ...build, and optimize core AI/ML platform infrastructure to... ...reliability, scalability, and performance across the AI/ML platform....TrainingPerformanceLocal areaRemote workWork from homeFlexible hours$182k - $260k
..., faster. We build high-performing teams that can make an impact... ...We are looking for a Principal Machine Learning Engineer to join our ML/AI team. This is a... ...security challenges at scale. What you’ll do (Role... ...and relevant education or training. The base salary range...TrainingPerformanceFull timeWork at officeLocal areaWorldwide$228.1k - $393.8k
...Machine Learning Engineering Manager – Ads Predictions... ...who has built and scaled complex machine... ...and deploying large-scale models, with... ...from data and model training pipelines to real-... ...are a hands-on ML leader who can drive... ...Lead and grow a high-performing, team of ML...TrainingPerformanceRelocation$180k
...While today's AI largely operates through chat... ...and manage large-scale GPU computing... ...clusters powering our AI training and deployment... ...intersection of systems engineering and machine... ...Partner closely with ML researchers and... ...level tooling and performance-critical services....TrainingPerformanceFull time- ...ML Engineer / Generalist HypeLab is a small, profitable... ...at real marketplace scale. We process more than... ...strong service, and better performance matter. Our customers... ...not be tucked away training models with no connection... ...and do not need a large team around you to make...TrainingPerformance
$181.1k - $318.4k
...Sr. ML Engineer, Siri User Experience Metrics and Data... ...identify regressions in Siri performance and alert engineering... ..., building and training ML models using distributed... ...Experience applying large language models (LLMs)... ...infrastructure, and large-scale operations, including...TrainingPerformanceRelocation$155.42k - $395.9k
...About the Team: The ML Compute Platform is... ...supports the training and deployment of state... ...with a focus on performance, availability, concurrency... ...a Senior Software Engineer to join our team and help us scale our platform for... ..., drive and design large initiatives across...TrainingPerformanceLocal areaRemote workWork from homeRelocationRelocation packageFlexible hours$193.3k - $261.5k
...forefront of maximizing performance for AWS's custom ML accelerators.... ...software boundary, our engineers craft high-performance... ...ML inference and training performance. As part... ...that are very large, yet our teams remain... ...patterns, reliability and scaling) of new and existing...TrainingPerformanceInternshipLocal areaWork from homeFlexible hours$272k - $431.25k
...we are seeking exceptional engineers to join our autonomous driving... ...ll Be Doing: Design and train innovative large-scale models—including generative... ...environments, ensuring performance, safety, and reliability standards... ...deploying production-grade ML models for self-driving,...TrainingPerformanceWork experience placement$272k - $431.25k
...are seeking an exceptional Principal Perception Engineer to lead the design and productization... ...best practices for training and evaluation, using techniques such as large-scale pretraining, distillation,... ...to quantify perception performance; analyze large-scale real and...TrainingPerformance$181.1k - $318.4k
...Staff/Sr. AI Infra Performance Engineer Scaling machine learning workloads across thousands of GPUs and TPUs creates challenges that few... ..., we build the infrastructure that powers large-scale ML training and inference workloads, bringing together expertise...TrainingPerformanceRelocation$206k - $303k
...Principal Engineer - Observability CoreWeave is The Essential... ...to build and scale AI with confidence. Trusted... ...infrastructure performance with deep technical expertise... ...comfort working in large-scale production... ...workloads (e.g., large-scale training/inference, GPU-based...TrainingPerformancePermanent employmentTemporary workCasual workWork at officeRemote workFlexible hours$147k - $211k
Google Inc. is seeking a skilled ML Compiler Software Engineer for its Sunnyvale office. The position requires a Bachelor's degree, proficiency... ...and collaborating with cross-functional teams to maximize performance. The US base salary for this full-time role is between $1...PerformanceFull timeWork at office- ...GPUs. Our novel wafer-scale architecture provides... ...industry-leading training and inference speeds... ...to effortlessly run large-scale ML applications, without... ...The Inference ML Engineering team at Cerebras Systems... ...platform, leveraging its performance, scalability, and flexibility...TrainingPerformance
$181.1k - $318.4k
...Senior ML Infrastructure Engineer - Training Algorithms, SIML Work Locations (2) Submit Resume Are... ...training, adapting and deploying large-scale generative models. In this role, you... ...algorithm owners for analyzing quality / performance tradeoffs of downstream...TrainingPerformanceRelocation$181.1k - $318.4k
...Sr. / Staff ML Engineer, FM Training Integration - ML Compute We are looking for a ML Engineer... ...role, you will lead the integration of large-scale ML workloads with cloud... ...engineers, and researchers to optimize performance, improve system efficiency, and drive...TrainingPerformanceRelocation$189.3k - $320.7k
...transportation on a global scale. Are you passionate... .... As a Staff ML Engineer on the Prometheus team... ...vehicle development-from training and validation to testing... ...use of GM's large scale datasets, utilizing... ...payouts based on company performance, job level, and individual...TrainingPerformanceLocal areaRemote workWork from homeRelocationRelocation packageFlexible hours- ...team of researchers, engineers, and designers who have... ...The role: As our first ML Engineer specializing... ...to deliver exceptional performance. What you’ll do:... ...integrate researcher‑trained model checkpoints into... ...multi‑GPU inference and large‑scale model serving. Are well...TrainingPerformanceRelocationVisa sponsorshipRelocation packageShift work
$153.2k - $234.1k
...transportation on a global scale. Role: Are you... .... As a Senior ML Infra Engineer, you will work on the... ...rapid dataset generation, training, evaluation and... ...models. From enabling large foundational driving models... ...training pipelines that are performant, easy to use, and...TrainingPerformanceLocal areaRemote workWork from homeRelocation packageFlexible hours
Do you want to receive more vacancies?
Subscribe and receive similar vacancies to Principal ML Engineer: Large-Scale Training Performance. Be the first to apply!
- director data engineering San Jose, CA
- senior civil engineer project manager San Jose, CA
- principal cloud engineer San Jose, CA
- director of product engineering San Jose, CA
- director systems engineering San Jose, CA
- engineering director San Jose, CA
- director of electrical engineering San Jose, CA
- principal infrastructure engineer San Jose, CA
- principal network engineer San Jose, CA
- chief engineer San Jose, CA

