Sign up to access all features of our service.
  • Job search
  • Favorites
  • Create a CV
    New
  • Salaries
  • Subscriptions

Principal ML Engineer: Large-Scale Training Performance

Advanced Micro Devices

A leading technology firm is seeking a Principal Machine Learning Engineer in San Jose, CA. The role focuses on optimizing distributed training for large models, making significant contributions to AMD's AI platform. The ideal candidate should have expertise in distributed training algorithms and be proficient in frameworks like PyTorch or TensorFlow. This position offers a collaborative environment dedicated to innovation. #J-18808-Ljbffr Advanced Micro Devices

Vacancy posted 3 days ago
Similar jobs that could be interesting for youBased on the Principal ML Engineer: Large-Scale Training Performance in San Jose, CA vacancy
  •  ...are looking for a Principal Machine Learning Engineer to join our Models...  ...challenge of distributed training of large models on a large...  ...generative AI at scale. THE PERSON:...  ...training pipeline performance. Optimize the distributed...  ...Experience with ML/DL frameworks such... 
    Training
    Performance

    Advanced Micro Devices , Inc.

    San Jose, CA
    4 days ago
  • $136.5k - $253.5k

     ...skyrocketed due to increasing performance demands from AI. We are a...  ...team of software developers, ML scientists, and research-minded engineers on a mission to change that....  ...learning engineers experienced in training large language models at scale, as well as accomplished... 
    Training
    Performance

    Cadence Inc

    San Jose, CA
    1 day ago
  • $136.5k - $253.5k

     ...skyrocketed due to increasing performance demands from AI. We are a...  ...team of software developers, ML scientists, and research-minded engineers on a mission to change that....  ...learning engineers experienced in training large language models at scale, as well as accomplished... 
    Training
    Performance

    Cadence Design Systems

    San Jose, CA
    4 days ago
  • $220k - $245k

     ...natural advantages for scale: photonsdon'tfeel...  ...interrelated areas: training and deploying ML models that operate directly...  ...in exploring large, complex design spaces...  ...training pipelines for high-performance model development....  ...with physicists and engineers to translate quantum... 
    Training
    Performance
    Full time
    Shift work

    PsiQuantum

    Milpitas, CA
    20 days ago
  • $159.3k - $230.7k

     ...transportation on a global scale. The Data...  ...on and delivers ML models to the...  ...impacting AV product performance through smart use...  ...uses existing very large datasets that GM has...  ...model pre-training and fine-tuning with...  ...impact team of AI/ML engineers, data scientists and... 
    Training
    Performance
    Local area
    Remote work
    Work from home
    Relocation package
    Flexible hours

    General Motors

    Sunnyvale, CA
    3 days ago
  •  .... Fellow Machine Learning Engineer to join the Training At Scale team in San Jose, CA. The...  ...on distributed training of large models and improve training...  ...enhancing pipeline performance, contributing to open source...  ...learning and experience with ML frameworks like PyTorch and... 
    Training
    Performance

    Advanced Micro Devices

    San Jose, CA
    3 days ago
  • $153.2k - $234.1k

     ...transportation on a global scale. Role Overview:...  ...machine learning engineer working on our...  ...the safety and performance of the car, rather...  .... As a Senior ML Infra Engineer, you...  ...learning model training and evaluation workflows...  ...building large-scale distributed... 
    Training
    Performance
    Work at office
    Local area
    Remote work
    Work from home
    Relocation
    Relocation package
    Flexible hours

    General Motors

    Sunnyvale, CA
    2 days ago
  • $275.8k - $340.5k

     ...the team: The AV ML Infra team at GM builds...  ...productivity of ML engineers, and drive the...  ...Ensures robust model performance by running large-scale simulation workloads...  ...andoptimizeslarge-scale ML training and inference across...  ...Overview: The Principal AI/ML Engineer will... 
    Training
    Performance
    Local area
    Remote work
    Work from home
    Relocation
    Relocation package
    Flexible hours

    General Motors

    Sunnyvale, CA
    2 days ago
  • $296.3k

     ...the team: The AV ML Infra team at GM builds...  ...productivity of ML engineers, and drive the...  ...Ensures robust model performance by running large-scale simulation workloads...  ...andoptimizeslarge-scale ML training and inference across...  ...Overview: The Principal AI/ML Engineer will... 
    Training
    Performance
    Local area
    Work from home
    Flexible hours

    General Motors

    Sunnyvale, CA
    10 hours ago
  • $272k - $431.25k

     ...looking for a Machine Learning (ML) Engineer to join the GPU accelerated...  ...engine in data centers for running large scale workloads for ETL, SQL, and ML/DL model training and inference pipelines,...  ...machine learning solutions for performance prediction and optimization of... 
    Training
    Performance

    NVIDIA

    Santa Clara, CA
    1 day ago
  • $157.2k - $254.1k

     ...Machine Learning Engineer to join our pioneering...  ...generative AI, large language models (LLMs...  ...to automate and scale our detection and...  ...experience building, training, and deploying...  ...record of taking ML projects from initial...  ...Go, or Rust) for performance-critical components... 
    Training
    Performance
    Full time
    Work at office

    Palo Alto Networks

    San Jose, CA
    5 days ago
  • $296.3k

     ...Role: We are seeking a Principal AI Engineer to lead the design and advancement...  ...that powers large-scale training and cloud inference. This...  ...build, and optimize core AI/ML platform infrastructure to...  ...reliability, scalability, and performance across the AI/ML platform.... 
    Training
    Performance
    Local area
    Remote work
    Work from home
    Flexible hours

    General Motors

    Sunnyvale, CA
    10 hours ago
  • $182k - $260k

     ..., faster. We build high-performing teams that can make an impact...  ...We are looking for a Principal Machine Learning Engineer to join our ML/AI team. This is a...  ...security challenges at scale. What you’ll do (Role...  ...and relevant education or training. The base salary range... 
    Training
    Performance
    Full time
    Work at office
    Local area
    Worldwide

    Zscaler

    San Jose, CA
    5 days ago
  • $228.1k - $393.8k

     ...Machine Learning Engineering Manager – Ads Predictions...  ...who has built and scaled complex machine...  ...and deploying large-scale models, with...  ...from data and model training pipelines to real-...  ...are a hands-on ML leader who can drive...  ...Lead and grow a high-performing, team of ML... 
    Training
    Performance
    Relocation

    Apple

    Cupertino, CA
    1 day ago
  • $180k

     ...While today's AI largely operates through chat...  ...and manage large-scale GPU computing...  ...clusters powering our AI training and deployment...  ...intersection of systems engineering and machine...  ...Partner closely with ML researchers and...  ...level tooling and performance-critical services.... 
    Training
    Performance
    Full time

    Hark

    San Jose, CA
    1 day ago
  •  ...ML Engineer / Generalist HypeLab is a small, profitable...  ...at real marketplace scale. We process more than...  ...strong service, and better performance matter. Our customers...  ...not be tucked away training models with no connection...  ...and do not need a large team around you to make... 
    Training
    Performance

    Hypelab

    San Jose, CA
    3 days ago
  • $181.1k - $318.4k

     ...Sr. ML Engineer, Siri User Experience Metrics and Data...  ...identify regressions in Siri performance and alert engineering...  ..., building and training ML models using distributed...  ...Experience applying large language models (LLMs)...  ...infrastructure, and large-scale operations, including... 
    Training
    Performance
    Relocation

    Apple

    Cupertino, CA
    10 hours ago
  • $155.42k - $395.9k

     ...About the Team: The ML Compute Platform is...  ...supports the training and deployment of state...  ...with a focus on performance, availability, concurrency...  ...a Senior Software Engineer to join our team and help us scale our platform for...  ..., drive and design large initiatives across... 
    Training
    Performance
    Local area
    Remote work
    Work from home
    Relocation
    Relocation package
    Flexible hours

    General Motors

    Sunnyvale, CA
    1 day ago
  • $193.3k - $261.5k

     ...forefront of maximizing performance for AWS's custom ML accelerators....  ...software boundary, our engineers craft high-performance...  ...ML inference and training performance. As part...  ...that are very large, yet our teams remain...  ...patterns, reliability and scaling) of new and existing... 
    Training
    Performance
    Internship
    Local area
    Work from home
    Flexible hours

    Amazon

    Cupertino, CA
    10 hours ago
  • $272k - $431.25k

     ...we are seeking exceptional engineers to join our autonomous driving...  ...ll Be Doing: Design and train innovative large-scale models—including generative...  ...environments, ensuring performance, safety, and reliability standards...  ...deploying production-grade ML models for self-driving,... 
    Training
    Performance
    Work experience placement

    NVIDIA

    Santa Clara, CA
    3 days ago
  • $272k - $431.25k

     ...are seeking an exceptional Principal Perception Engineer to lead the design and productization...  ...best practices for training and evaluation, using techniques such as large-scale pretraining, distillation,...  ...to quantify perception performance; analyze large-scale real and... 
    Training
    Performance

    NVIDIA

    Santa Clara, CA
    10 hours ago
  • $181.1k - $318.4k

     ...Staff/Sr. AI Infra Performance Engineer Scaling machine learning workloads across thousands of GPUs and TPUs creates challenges that few...  ..., we build the infrastructure that powers large-scale ML training and inference workloads, bringing together expertise... 
    Training
    Performance
    Relocation

    Apple

    Santa Clara, CA
    10 hours ago
  • $206k - $303k

     ...Principal Engineer - Observability CoreWeave is The Essential...  ...to build and scale AI with confidence. Trusted...  ...infrastructure performance with deep technical expertise...  ...comfort working in large-scale production...  ...workloads (e.g., large-scale training/inference, GPU-based... 
    Training
    Performance
    Permanent employment
    Temporary work
    Casual work
    Work at office
    Remote work
    Flexible hours

    CoreWeave

    Sunnyvale, CA
    1 day ago
  • $147k - $211k

    Google Inc. is seeking a skilled ML Compiler Software Engineer for its Sunnyvale office. The position requires a Bachelor's degree, proficiency...  ...and collaborating with cross-functional teams to maximize performance. The US base salary for this full-time role is between $1... 
    Performance
    Full time
    Work at office

    Google Inc.

    Sunnyvale, CA
    4 days ago
  •  ...GPUs. Our novel wafer-scale architecture provides...  ...industry-leading training and inference speeds...  ...to effortlessly run large-scale ML applications, without...  ...The Inference ML Engineering team at Cerebras Systems...  ...platform, leveraging its performance, scalability, and flexibility... 
    Training
    Performance

    CEREBRAS SYSTEMS INC.

    Sunnyvale, CA
    4 days ago
  • $181.1k - $318.4k

     ...Senior ML Infrastructure Engineer - Training Algorithms, SIML Work Locations (2) Submit Resume Are...  ...training, adapting and deploying large-scale generative models. In this role, you...  ...algorithm owners for analyzing quality / performance tradeoffs of downstream... 
    Training
    Performance
    Relocation

    Apple

    Cupertino, CA
    1 day ago
  • $181.1k - $318.4k

     ...Sr. / Staff ML Engineer, FM Training Integration - ML Compute We are looking for a ML Engineer...  ...role, you will lead the integration of large-scale ML workloads with cloud...  ...engineers, and researchers to optimize performance, improve system efficiency, and drive... 
    Training
    Performance
    Relocation

    Apple

    Santa Clara, CA
    4 days ago
  • $189.3k - $320.7k

     ...transportation on a global scale. Are you passionate...  .... As a Staff ML Engineer on the Prometheus team...  ...vehicle development-from training and validation to testing...  ...use of GM's large scale datasets, utilizing...  ...payouts based on company performance, job level, and individual... 
    Training
    Performance
    Local area
    Remote work
    Work from home
    Relocation
    Relocation package
    Flexible hours

    General Motors

    Sunnyvale, CA
    1 day ago
  •  ...team of researchers, engineers, and designers who have...  ...The role: As our first ML Engineer specializing...  ...to deliver exceptional performance. What you’ll do:...  ...integrate researcher‑trained model checkpoints into...  ...multi‑GPU inference and large‑scale model serving. Are well... 
    Training
    Performance
    Relocation
    Visa sponsorship
    Relocation package
    Shift work

    Photalabs

    San Jose, CA
    10 hours ago
  • $153.2k - $234.1k

     ...transportation on a global scale. Role: Are you...  .... As a Senior ML Infra Engineer, you will work on the...  ...rapid dataset generation, training, evaluation and...  ...models. From enabling large foundational driving models...  ...training pipelines that are performant, easy to use, and... 
    Training
    Performance
    Local area
    Remote work
    Work from home
    Relocation package
    Flexible hours

    General Motors

    Sunnyvale, CA
    4 days ago

Do you want to receive more vacancies?

Subscribe and receive similar vacancies to Principal ML Engineer: Large-Scale Training Performance. Be the first to apply!