Sign up to access all features of our service.
  • Job search
  • Favorites
  • Create a CV
    New
  • Salaries
  • Subscriptions

Distributed Machine Learning Engineer

$150k

Institute of Foundation Models

About the Institute of Foundation Models

We are a dedicated research lab for building, understanding, using, and risk-managing foundation models. Our mandate is to advance research, nurture the next generation of AI builders, and drive transformative contributions to a knowledge-driven economy.

As part of our team, you’ll have the opportunity to work on the core of cutting‑edge foundation model training, alongside world‑class researchers, data scientists, and engineers, tackling the most fundamental and impactful challenges in AI development. You will participate in the development of groundbreaking AI solutions that have the potential to reshape entire industries. Strategic and innovative problem‑solving skills will be instrumental in establishing MBZUAI as a global hub for high‑performance computing in deep learning, driving impactful discoveries that inspire the next generation of AI pioneers.

The Role

The Distributed ML Engineer will play a role at the forefront of optimizing performance for the machine learning software stacks, especially at training and inference, and support the team to develop new and cutting‑edge systems. The ideal candidate will have a strong background in parallel computing, and hands‑on experience in system level coding, debug methodologies, and large‑scale machine learning experience.

Key Responsibilities

  • Understand, analyze, profile, optimize, and provide guidance to the team on deep learning workloads on state‑of‑the‑art hardware and software platforms to improve their efficiency with different levels of optimization
  • Design and implement performance benchmarks and testing methodologies to evaluate application performance
  • Build tools to automate workload analysis, workload optimization, and other critical workflows
  • Triage system issues and identify bottleneck and inefficiencies by analyzing the sources of issues and the impact on hardware, network and propose solutions to enhance GPU utilization
  • Support the team to develop appropriate kernels and systems for new model architectures and algorithms
  • Participate in, or lead design reviews with peers and stakeholders to decide amongst available technologies.
  • Review code developed by other developers and provide feedback to ensure best practices (e.g., style guidelines, checking code in, accuracy, testability, and efficiency).
  • Contribute to existing documentation or educational content and adapt content based on product/program updates and user feedback.
  • Represent MBZUAI at industry conferences and events, showcasing the institution’s cutting‑edge HPC and deep learning capabilities and establishing MBZUAI as a global leader in AI research and innovation.
  • Perform all other duties as reasonably directed by the line manager that are commensurate with the functional objectives.

Academic Qualifications

  • Ph.D. in CS, EE or CSEE with 1+ years working experience, OR
  • Masters in CS, EE or CSEE or equivalent experience with 2+ years working experience

$150,000 - $450,000 a year

Visa Sponsorship

This position is eligible for visa sponsorship.

Benefits Include

  • Comprehensive medical, dental, and vision benefits
  • Bonus
  • 401K Plan
  • Generous paid time off, sick leave and holidays
  • Paid Parental Leave
  • Employee Assistance Program
  • Life insurance and disability
#J-18808-Ljbffr

Vacancy posted 23 hours ago
Similar jobs that could be interesting for youBased on the Distributed Machine Learning Engineer in Sunnyvale, CA vacancy
  • $140k - $220k

     ...feedback and needs. ABOUT THE JOB We are looking for a Machine Learning Engineer to help build and develop our ML capabilities at RADAR....  ...Expertise in big data processing including SQL optimization and distributed computing (Spark/Dask) ~ Production experience with... 
    Suggested
    Work at office
    Flexible hours

    RADAR

    Sunnyvale, CA
    24 days ago
  •  ...) matching ~ Dental insurance ~ Health insurance Machine Learning Engineer 100% Remote We are seeking a highly skilled Machine...  ...architecture, and model deployment. ~ Experience working with distributed computing frameworks such as Spark . This is a... 
    Suggested
    Remote work

    RAPID EAGLE INC

    Sunnyvale, CA
    6 days ago
  • $126k - $181.5k

     ...Software Engineering Mountain View, California Machine Learning Engineering TL, Behavior Planning Who we are Aurora’s mission is to deliver the benefits...  ...training large models on massive datasets using distributed computing. ~ Fluency in Python, with a focus on... 
    Suggested
    Local area

    Australian Competition and Consumer Commission

    Mountain View, CA
    23 hours ago
  •  ...we invite you to join our Conversation Engine team. At our company, you'll have the unique...  .... You'll collaborate closely with machine learning experts and cross-functional teams, rapidly...  ...Work Personas We approach our distributed world of work with flexibility and trust... 
    Suggested
    Work at office
    Remote work
    Flexible hours

    ServiceNow

    Mountain View, CA
    23 hours ago
  •  ...Candidates only Position Summary Seeking an experienced Machine Learning Engineer to lead the development of prompt injection and prompt...  ..., jailbreak, and agentic AI threat models, and with distributed training frameworks (DeepSpeed, FSDP, Accelerate). Preferred... 
    Suggested

    The Fountain Group

    Mountain View, CA
    1 day ago
  •  ...About the job Machine Learning Engineer Glint Tech Solutions is Hiring an experienced Machine Learning Engineer to join our client's...  ...Hands-on experience with Kubernetes and container orchestration Strong understanding of scalability and distributed systems

    Glint Tech Solutions LLC

    Sunnyvale, CA
    2 days ago
  • $170k - $216k

     ...Perception Machine Learning Engineer Waymo is an autonomous driving technology company with the mission to be the world's most trusted driver...  ...and post-training. Develop methods and recipes for distributed fine-tuning enabling multiple developers to simultaneously... 
    Full time
    Remote work

    Waymo

    Mountain View, CA
    2 days ago
  • $150k

     ...researchers, data scientists, and engineers, tackling the most...  ...performance computing in deep learning, driving impactful discoveries...  ...Role We're looking for a distributed ML infrastructure engineer to...  ...Experience with large‑scale machine learning workloads (strong... 
    Flexible hours

    Institute of Foundation Models

    Sunnyvale, CA
    23 hours ago
  •  ...Intuit is seeking a highly motivated and experienced Principal Machine Learning Engineer to join our Mid Market AI team. In this influential role,...  ...science fundamentals (data structures, algorithms, distributed systems). ~ Proven communication and leadership skills to... 

    Intuit Inc.

    Mountain View, CA
    23 hours ago
  •  ...Description Job Description We are seeking a highly skilled Machine Learning Engineer with deep expertise in developing Bird’s Eye View (BEV)...  ...real-world or production environments. Experience with distributed training, high-performance computing, or GPU acceleration.... 

    PlusAI

    Santa Clara, CA
    2 days ago
  •  ...researchers, data scientists, and engineers, tackling the most...  ...performance computing in deep learning, driving impactful discoveries...  ...foundation models to unlock machine intelligence beyond lingual....  ...versioning  ~ Build and manage distributed systems for large-scale data... 
    Visa sponsorship

    Institute of Foundation Models

    Sunnyvale, CA
    11 days ago
  • $172.2k - $258.4k

     ...The opportunity We are looking for a Staff Machine Learning Engineer to join our Vector Core Modeling team. In this role, you will design...  ...might also have Experience working with large datasets and distributed computing frameworks Experience building ads systems,... 
    Work at office
    Worldwide
    Relocation package

    Unity

    Mountain View, CA
    2 days ago
  •  ...workflows, and continuously learn and adapt. Moveworks is trusted...  ...with Moveworks’ Reasoning Engine and natural language...  ...excels in using cutting-edge Machine Learning technologies, particularly...  ...Personas We approach our distributed world of work with flexibility... 
    Work at office
    Remote work
    Flexible hours

    ServiceNow

    Mountain View, CA
    4 days ago
  • $195k - $230k

     ...visit About the Role We are looking for a Senior Machine Learning Engineer to help evolve our large-scale recommendation systems...  ...working with large-scale data and ML systems (e.g., Spark, distributed training, real-time inference). - Ability to own features... 
    Full time
    Local area
    Work from home

    NewsBreak

    Mountain View, CA
    3 days ago
  •  ...organizations that keep the world running. Our Team's Vision: Our Engineering team is shaping the future of cybersecurity. We thrive on...  ...As a Senior Software Engineer, you will architect high-scale distributed systems that process massive data volumes to fuel our Agentic... 
    Immediate start

    Illumio

    Sunnyvale, CA
    4 days ago
  •  ...workflows, and continuously learn and adapt. Moveworks is trusted...  ...with Moveworks' Reasoning Engine and natural language...  ...Role We are looking for a Machine Learning Engineer to help build...  ...of responsibilities including distributed training and inference pipeline... 
    Work at office
    Remote work
    Flexible hours

    ServiceNow

    Mountain View, CA
    3 days ago
  •  ...workflows, and continuously learn and adapt. Moveworks is trusted...  ...with Moveworks' Reasoning Engine and natural language...  ...experienced software engineer with machine learning expertise to join us...  ...Personas We approach our distributed world of work with flexibility... 
    Work at office
    Immediate start
    Remote work
    Flexible hours

    ServiceNow

    Mountain View, CA
    4 days ago
  • $238k - $302k

     ...Waymo AI Foundations team is to develop machine learning solutions addressing open problems in...  ...report to a Senior Staff Software Engineer. You will: * Work with a creative...  ...record * Experience with large scale distributed system * Proficient programming... 
    Full time
    Temporary work
    Remote work

    Waymo

    Mountain View, CA
    7 days ago
  •  ...volume, real-time, multi-modal machine-generated data — including...  ...Splunk and Cisco's global engineering capabilities. Our work spans...  ...and unstructured data, deep learning-based time series modeling,...  ...optimizing model architectures, distributed training pipelines, and... 
    Flexible hours

    Webex Events (formerly Socio)

    Mountain View, CA
    2 days ago
  • $160k - $200k

     ...As a Senior ML Infrastructure Engineer at Plus, you will design...  ...integrated with state-of-the-art deep learning frameworks like PyTorch or...  ...of what's possible in machine learning infrastructure and contribute...  ...usability. Implement distributed systems and storage solutions... 

    PlusAI, Inc.

    Santa Clara, CA
    23 hours ago
  • $92k - $138k

     ...Mountain View, CA, USA Machine Learning Engineer, Offline Infrastructure (Entry-Level / New Grad) Location Mountain View, CA, USA...  ...infrastructure that powers training data generation, ML workflows, and distributed model training. Working closely with experienced engineers... 
    Work at office
    Worldwide
    Relocation package

    Unity Technologies

    Mountain View, CA
    5 days ago
  • $170k - $240k

     ...and model development initiatives. As a Senior ML Engineer, you will collaborate closely with machine learning engineers, research scientists, and other...  ...performance analysis and optimization solutions to scale distributed training workflows and maximize resource... 
    Local area
    Remote work
    Work from home
    Relocation
    Relocation package
    Flexible hours

    General Motors

    Mountain View, CA
    23 hours ago
  • $214k - $289.5k

     ...Overview Come join Intuit as a Senior Staff Machine Learning Engineer (MLE). Senior Staff MLEs deliver end-to-end AI solutions that span...  ...~ Deep hands-on experience with modern ML frameworks and distributed systems (TensorFlow, PyTorch, Spark, Ray, Kubernetes, MLflow... 
    Local area

    Intuit

    Mountain View, CA
    3 days ago
  • $170k - $216k

     ...Machine Learning Engineer (Infra), Driver Understanding and Evaluation Waymo is an autonomous driving technology company with the mission...  ...cumulatively travel millions of miles. Design and scale large distributed systems covering the ML lifecycle, supporting planet-scale... 
    Full time

    Waymo

    Mountain View, CA
    7 days ago
  •  ...product excels in using cutting-edge Machine Learning technologies, particularly Generative...  ...critical tasks. As a conversation product engineer, you'll apply these technologies to...  ...Information Work Personas We approach our distributed world of work with flexibility and... 
    Work at office
    Remote work
    Flexible hours

    ServiceNow

    Mountain View, CA
    23 hours ago
  • $150k

     ...class researchers, data scientists, and engineers, tackling the most fundamental and impactful...  ...for high-performance computing in deep learning, driving impactful discoveries that...  ...generation of AI pioneers. The Role As a Machine Learning Engineer at the Institute of... 
    Worldwide
    Visa sponsorship

    Institute of Foundation Models

    Sunnyvale, CA
    23 hours ago
  •  ...workflows, and continuously learn and adapt. Moveworks is trusted...  ...with Moveworks' Reasoning Engine and natural language...  ...Use the latest advances in machine learning, LLMs, and AI agents...  ...Personas We approach our distributed world of work with flexibility... 
    Work at office
    Immediate start
    Remote work
    Flexible hours

    ServiceNow

    Mountain View, CA
    3 days ago
  • $2,000 per month

     ...Machine Learning Research Engineer Cupertino, CA Etched is building AI chips that are hard-coded for individual model architectures. Our first...  ...(vLLM, SGLang, etc.) and/or experience working in distributed inference/training environments Experience working cross... 
    Work at office
    Relocation package

    ETCHED LLC

    Cupertino, CA
    2 days ago
  •  ...allowing autonomous devices like vehicles and robots to make more intelligent and safe decisions.    Role Overview: As an ML Engineer on our perception team, you will own the development and deployment of 3D perception models across object detection, semantic... 
    Flexible hours

    Aeva, Inc.

    Mountain View, CA
    20 days ago
  • $215.28k - $364.32k

     ...Staff Machine Learning Engineer - Foundation Model Santa Clara, CA XPENG is a leading smart technology company at the forefront of innovation...  ...to scale training across thousands of GPUs using distributed training frameworks (FSDP, DDP, etc.). Conduct systematic... 
    Full time

    XPENG

    Santa Clara, CA
    3 days ago

Do you want to receive more vacancies?

Subscribe and receive similar vacancies to Distributed Machine Learning Engineer. Be the first to apply!