Sign up to access all features of our service.
  • Job search
  • Favorites
  • Create a CV
    New
  • Salaries
  • Subscriptions

Principal ML Engineer: Large-Scale Training Performance

Advanced Micro Devices , Inc.

A leading technology firm is seeking a Principal Machine Learning Engineer in San Jose, CA. The role focuses on optimizing distributed training for large models, making significant contributions to AMD's AI platform. The ideal candidate should have expertise in distributed training algorithms and be proficient in frameworks like PyTorch or TensorFlow. This position offers a collaborative environment dedicated to innovation. #J-18808-Ljbffr

Vacancy posted 1 day ago
Similar jobs that could be interesting for youBased on the Principal ML Engineer: Large-Scale Training Performance in San Jose, CA vacancy
  •  ...beyond. Together, we advance your career. PMTS Large Scale Training Performance Optimization ENGINEER THE ROLE: We are looking for a Principal Machine Learning Engineer to join our...  ...stakeholders. PREFERRED EXPERIENCE: Experience with ML/DL frameworks such as PyTorch, JAX, or... 
    Training
    Performance

    Advanced Micro Devices , Inc.

    San Jose, CA
    11 hours ago
  •  .... Fellow Machine Learning Engineer to join the Training At Scale team in San Jose, CA. The...  ...on distributed training of large models and improve training...  ...enhancing pipeline performance, contributing to open source...  ...learning and experience with ML frameworks like PyTorch and... 
    Training
    Performance

    Advanced Micro Devices , Inc.

    San Jose, CA
    12 hours ago
  • $224k - $356.5k

     ...Machine Learning and Simulation Engineers for their Autonomous...  ...design and development of large-scale RL training frameworks to enhance multi...  ...on simulation accuracy and performance. The ideal candidate has over...  ...12 years of experience in ML training, simulating AV systems... 
    Training
    Performance

    NVIDIA Gruppe

    Santa Clara, CA
    12 hours ago
  • $189k - $300k

     ...transportation on a global scale. The Data Scaling...  ...on and delivers ML models to the...  ...AV product performance through smart use...  ...uses existing very large datasets that GM has...  ...foundation model pre‑training and fine‑tuning with...  ...team of AI/ML engineers, data scientists and... 
    Training
    Performance
    Local area
    Remote work
    Relocation
    Relocation package
    Flexible hours

    General Motors

    Sunnyvale, CA
    11 hours ago
  • $153.2k - $234.1k

     ...transportation on a global scale. Role Overview:...  ...machine learning engineer working on our...  ...the safety and performance of the car, rather...  .... As a Senior ML Infra Engineer, you...  ...learning model training and evaluation workflows...  ...building large-scale distributed... 
    Training
    Performance
    Work at office
    Local area
    Remote work
    Work from home
    Relocation
    Relocation package
    Flexible hours

    General Motors

    Sunnyvale, CA
    5 days ago
  • $275.8k - $340.5k

     ...the team: The AV ML Infra team at GM builds...  ...productivity of ML engineers, and drive the...  ...Ensures robust model performance by running large-scale simulation workloads...  ...andoptimizeslarge-scale ML training and inference across...  ...Overview: The Principal AI/ML Engineer will... 
    Training
    Performance
    Local area
    Remote work
    Work from home
    Relocation
    Relocation package
    Flexible hours

    General Motors

    Sunnyvale, CA
    5 days ago
  • $296.3k

     ...the team: The AV ML Infra team at GM builds...  ...productivity of ML engineers, and drive the...  ...Ensures robust model performance by running large-scale simulation workloads...  ...andoptimizeslarge-scale ML training and inference across...  ...Overview: The Principal AI/ML Engineer will... 
    Training
    Performance
    Local area
    Work from home
    Flexible hours

    General Motors

    Sunnyvale, CA
    2 days ago
  • $296.3k

     ...The Role: We are seeking a Principal AI Engineer to lead the design and...  ...infrastructure that powers large-scale training and cloud inference. This...  ...build, and optimize core AI/ML platform infrastructure to...  ...reliability, scalability, and performance across the AI/ML platform.... 
    Training
    Performance
    Remote work
    Flexible hours

    General Motors

    Sunnyvale, CA
    12 hours ago
  • $291.5k - $369.1k

     ...expertise with the scale and operational...  ...and Cisco's global engineering capabilities. Our...  ...following areas: large language modeling...  ...~ LargeScale Training & Optimization - Experience...  ...production monitoring of ML models. ~...  ...sales plans earn performance-based incentive... 
    Training
    Performance
    Full time
    Temporary work
    Local area
    Flexible hours

    Cisco

    San Jose, CA
    3 days ago
  • $206.4k - $384.68k

     ...are hiring a Director, ML Engineering to own the...  ...services at enterprise scale. This is a multi-faceted...  ...inference quality matches the training environment, and to...  ...visions into reliable, high-performance ML systems that...  ...engineering reality of large-scale inference: accelerator... 
    Training
    Performance
    Temporary work
    Local area
    Worldwide

    Adobe

    San Jose, CA
    5 days ago
  • $157.2k - $254.1k

     ...Machine Learning Engineer We are seeking...  ...using generative AI, large language models (...  ...systems to automate and scale our detection and...  ...building, training, and deploying machine...  ...record of taking ML projects from initial...  ...Go, or Rust) for performance-critical... 
    Training
    Performance

    Palo Alto Networks

    Santa Clara, CA
    2 days ago
  • $291.5k - $369.1k

     ...expertise with the scale and operational...  ...and Cisco’s global engineering capabilities. Our...  ...and deployment for large‑scale foundation models...  .... Large‑Scale Training & Optimization –...  ...monitoring of ML models. Strong...  ...sales plans earn performance-based incentive pay... 
    Training
    Performance
    Full time
    Temporary work
    Local area
    Flexible hours

    CISCO, Inc.

    Milpitas, CA
    4 days ago
  •  ...company in Sunnyvale is seeking a Staff Software Engineer to drive AI/ML performance. The successful candidate will handle large-scale system design and optimization challenges,...  ...teams to ensure peak performance of ML training and serving workflows, directly impacting... 
    Training
    Performance

    Google Inc.

    Sunnyvale, CA
    2 days ago
  •  ...Scientist specializing in privacy-preserving model training and architecture optimization in San Jose. The candidate will design and optimize large-scale training architectures for advanced generative models and lead performance optimization efforts across GPUs. This role... 
    Training
    Performance

    Ellis Technologies, Inc.

    San Jose, CA
    1 day ago
  • $180k

     ...While today's AI largely operates through chat...  ...and manage large-scale GPU computing...  ...clusters powering our AI training and deployment...  ...intersection of systems engineering and machine...  ...Partner closely with ML researchers and...  ...level tooling and performance-critical services.... 
    Training
    Performance
    Full time

    Hark

    San Jose, CA
    4 days ago
  • $228.1k - $393.8k

     ...Machine Learning Engineering Manager – Ads Predictions...  ...who has built and scaled complex machine...  ...and deploying large-scale models, with...  ...from data and model training pipelines to real-...  ...are a hands-on ML leader who can drive...  ...Lead and grow a high-performing, team of ML... 
    Training
    Performance
    Relocation

    Apple

    Cupertino, CA
    3 days ago
  • $272k - $431.25k

     ...is seeking a Senior MLOps Engineering Manager to join our...  ...development, and operation of large‑scale, end‑to‑end data and ML pipelines that power NVIDIA...  ...radar—into high‑quality training, evaluation, and...  ...Doing Lead and grow a high‑performing MLOps engineering group tasked... 
    Training
    Performance

    NVIDIA Gruppe

    Santa Clara, CA
    12 hours ago
  • $181.1k - $318.4k

     ...Sr. ML Engineer, Siri User Experience Metrics and Data...  ...identify regressions in Siri performance and alert engineering...  ..., building and training ML models using distributed...  .... Experience applying large language models (LLMs)...  ..., and large‑scale operations, including... 
    Training
    Performance
    Relocation

    Apple

    Cupertino, CA
    12 hours ago
  • $144.7k - $261.3k

     ..., cloud infrastructure, and ML/AI GPU platforms for AV research...  ...GM is looking for a Senior Performance Engineer to join the AV Capacity and...  ...is to provide input into large scale ML infrastructure strategy,...  ...reliability of large-scale ML training and inference environments.... 
    Training
    Performance
    Local area
    Remote work
    Work from home
    Flexible hours
    3 days per week

    General Motors

    Sunnyvale, CA
    1 day ago
  •  ...ML Engineer / Generalist HypeLab is a small, profitable...  ...at real marketplace scale. We process more than...  ...strong service, and better performance matter. Our customers...  ...not be tucked away training models with no connection...  ...and do not need a large team around you to make... 
    Training
    Performance

    Hypelab

    San Jose, CA
    5 days ago
  •  ...Staff/Sr. ML Compute Efficiency Engineer Scaling machine learning workloads across thousands of GPUs and...  ...the infrastructure that powers large-scale ML training and inference workloads, bringing...  ...infrastructure, and high-performance computing. As a performance engineer... 
    Training
    Performance

    Apple

    Santa Clara, CA
    2 days ago
  •  ...to optimize resource usage for training and fine-tuning models, ensuring high performance while maintaining efficiency....  ...of operational expenses through large-scale adoption. Responsibilities...  ...functional teams, including product, engineering, and business stakeholders, to... 
    Training
    Performance

    Kognitos

    San Jose, CA
    11 hours ago
  • $156k - $316.8k

     ...Research Scientist — Privacy-Preserving Large-Scale Model Training & Architecture Optimization Location:...  ...diffusion systems). Lead GPU-centric performance optimization, including memory layout,...  ...Qualifications: Experience with privacy-preserving ML, sensitive data training, or regulated... 
    Training
    Performance
    Temporary work
    Local area

    Ellis Technologies, Inc.

    San Jose, CA
    12 hours ago
  • $189.3k - $320.7k

     ...Design and implement ML solutions aligned...  ...as unsupervised pre‑training, supervised learning, model scaling/selection, and...  ...systematic use of GM’s large‑scale datasets, utilizing...  ...Support and mentor engineers through technical...  ...linked to company performance, job level, and... 
    Training
    Performance
    Local area
    Remote work
    Flexible hours

    General Motors

    Sunnyvale, CA
    12 hours ago
  • $170.7k - $300.2k

     ...Group is looking for engineers to work on developing...  ...with experts in high-performance computing, machine learning...  ...proficiency in ML modeling frameworks, experience...  ...and integrating large-scale distributed machine learning...  ..., improving GPU training approaches, developing... 
    Training
    Performance

    Career-Mover

    Cupertino, CA
    1 day ago
  • $147k - $211k

    Google Inc. is seeking a skilled ML Compiler Software Engineer for its Sunnyvale office. The position requires a Bachelor's degree, proficiency...  ...and collaborating with cross-functional teams to maximize performance. The US base salary for this full-time role is between $1... 
    Performance
    Full time
    Work at office

    Google Inc.

    Sunnyvale, CA
    1 day ago
  • $272k - $431.25k

     ...Principal Deep Learning Senior Engineer, End-To-End Autonomous Driving page is loaded##...  ...Doing:*** Design and train innovative large-scale models—including generative...  ...environments, ensuring performance, safety, and...  ...deploying production-grade ML models for self-driving... 
    Training
    Performance
    Work experience placement

    NVIDIA

    Santa Clara, CA
    12 hours ago
  • $181.1k - $272.1k

     ...ML Infrastructure Engineer - Multimodal Training Tools, SIML Work Locations (2) Submit Resume Are you passionate...  ..., adapting and deploying large-scale generative models. You will be working...  ...training and inference performance Integrating efficient data loading... 
    Training
    Performance
    Relocation

    Apple

    Cupertino, CA
    3 days ago
  • $272k - $431.25k

     ...establish best practices for training and evaluation, using techniques such as large-scale pretraining,...  ...to quantify perception performance; analyze large-scale real...  ...mentorship to other engineers, influencing design and...  ...leadership as a senior or principal‑level individual... 
    Training
    Performance

    NVIDIA Gruppe

    Santa Clara, CA
    12 hours ago
  •  ...GPUs. Our novel wafer-scale architecture provides...  ...industry-leading training and inference speeds...  ...to effortlessly run large-scale ML applications, without...  ...The Inference ML Engineering team at Cerebras Systems...  ...platform, leveraging its performance, scalability, and flexibility... 
    Training
    Performance

    CEREBRAS SYSTEMS INC.

    Sunnyvale, CA
    1 day ago

Do you want to receive more vacancies?

Subscribe and receive similar vacancies to Principal ML Engineer: Large-Scale Training Performance. Be the first to apply!