Distributed Machine Learning Engineer
$150kInstitute of Foundation Models
About the Institute of Foundation Models
We are a dedicated research lab for building, understanding, using, and risk-managing foundation models. Our mandate is to advance research, nurture the next generation of AI builders, and drive transformative contributions to a knowledge-driven economy.
As part of our team, you’ll have the opportunity to work on the core of cutting‑edge foundation model training, alongside world‑class researchers, data scientists, and engineers, tackling the most fundamental and impactful challenges in AI development. You will participate in the development of groundbreaking AI solutions that have the potential to reshape entire industries. Strategic and innovative problem‑solving skills will be instrumental in establishing MBZUAI as a global hub for high‑performance computing in deep learning, driving impactful discoveries that inspire the next generation of AI pioneers.
The Role
The Distributed ML Engineer will play a role at the forefront of optimizing performance for the machine learning software stacks, especially at training and inference, and support the team to develop new and cutting‑edge systems. The ideal candidate will have a strong background in parallel computing, and hands‑on experience in system level coding, debug methodologies, and large‑scale machine learning experience.
Key Responsibilities
- Understand, analyze, profile, optimize, and provide guidance to the team on deep learning workloads on state‑of‑the‑art hardware and software platforms to improve their efficiency with different levels of optimization
- Design and implement performance benchmarks and testing methodologies to evaluate application performance
- Build tools to automate workload analysis, workload optimization, and other critical workflows
- Triage system issues and identify bottleneck and inefficiencies by analyzing the sources of issues and the impact on hardware, network and propose solutions to enhance GPU utilization
- Support the team to develop appropriate kernels and systems for new model architectures and algorithms
- Participate in, or lead design reviews with peers and stakeholders to decide amongst available technologies.
- Review code developed by other developers and provide feedback to ensure best practices (e.g., style guidelines, checking code in, accuracy, testability, and efficiency).
- Contribute to existing documentation or educational content and adapt content based on product/program updates and user feedback.
- Represent MBZUAI at industry conferences and events, showcasing the institution’s cutting‑edge HPC and deep learning capabilities and establishing MBZUAI as a global leader in AI research and innovation.
- Perform all other duties as reasonably directed by the line manager that are commensurate with the functional objectives.
Academic Qualifications
- Ph.D. in CS, EE or CSEE with 1+ years working experience, OR
- Masters in CS, EE or CSEE or equivalent experience with 2+ years working experience
$150,000 - $450,000 a year
Visa Sponsorship
This position is eligible for visa sponsorship.
Benefits Include
- Comprehensive medical, dental, and vision benefits
- Bonus
- 401K Plan
- Generous paid time off, sick leave and holidays
- Paid Parental Leave
- Employee Assistance Program
- Life insurance and disability
- ...) matching ~ Dental insurance ~ Health insurance Machine Learning Engineer 100% Remote We are seeking a highly skilled Machine... ...architecture, and model deployment. ~ Experience working with distributed computing frameworks such as Spark . This is a...SuggestedRemote work
$140k - $220k
...feedback and needs. ABOUT THE JOB We are looking for a Machine Learning Engineer to help build and develop our ML capabilities at RADAR.... ...Expertise in big data processing including SQL optimization and distributed computing (Spark/Dask) ~ Production experience with...SuggestedWork at officeFlexible hours$126k - $181.5k
...Software Engineering Mountain View, California Machine Learning Engineering TL, Behavior Planning Who we are Aurora’s mission is to deliver the benefits... ...training large models on massive datasets using distributed computing. ~ Fluency in Python, with a focus on...SuggestedLocal area- ...About the job Machine Learning Engineer Glint Tech Solutions is Hiring an experienced Machine Learning Engineer to join our client's... ...Hands-on experience with Kubernetes and container orchestration Strong understanding of scalability and distributed systemsSuggested
$170k - $216k
...Perception Machine Learning Engineer Waymo is an autonomous driving technology company with the mission to be the world's most trusted driver... ...and post-training. Develop methods and recipes for distributed fine-tuning enabling multiple developers to simultaneously...SuggestedFull timeRemote work- ...we invite you to join our Conversation Engine team. At our company, you'll have the unique... .... You'll collaborate closely with machine learning experts and cross-functional teams, rapidly... ...Work Personas We approach our distributed world of work with flexibility and trust...Work at officeRemote workFlexible hours
- ...Candidates only Position Summary Seeking an experienced Machine Learning Engineer to lead the development of prompt injection and prompt... ..., jailbreak, and agentic AI threat models, and with distributed training frameworks (DeepSpeed, FSDP, Accelerate). Preferred...
$150k
...researchers, data scientists, and engineers, tackling the most... ...performance computing in deep learning, driving impactful discoveries... ...Role We're looking for a distributed ML infrastructure engineer to... ...Experience with large‑scale machine learning workloads (strong...Flexible hours- ...Intuit is seeking a highly motivated and experienced Principal Machine Learning Engineer to join our Mid Market AI team. In this influential role,... ...science fundamentals (data structures, algorithms, distributed systems). ~ Proven communication and leadership skills to...
- ...Description Job Description We are seeking a highly skilled Machine Learning Engineer with deep expertise in developing Bird’s Eye View (BEV)... ...real-world or production environments. Experience with distributed training, high-performance computing, or GPU acceleration....
- ...researchers, data scientists, and engineers, tackling the most... ...performance computing in deep learning, driving impactful discoveries... ...foundation models to unlock machine intelligence beyond lingual.... ...versioning ~ Build and manage distributed systems for large-scale data...Visa sponsorship
$172.2k - $258.4k
...The opportunity We are looking for a Staff Machine Learning Engineer to join our Vector Core Modeling team. In this role, you will design... ...might also have Experience working with large datasets and distributed computing frameworks Experience building ads systems,...Work at officeWorldwideRelocation package- ...organizations that keep the world running. Our Team's Vision: Our Engineering team is shaping the future of cybersecurity. We thrive on... ...As a Senior Software Engineer, you will architect high-scale distributed systems that process massive data volumes to fuel our Agentic...Immediate start
- ...workflows, and continuously learn and adapt. Moveworks is trusted... ...with Moveworks’ Reasoning Engine and natural language... ...excels in using cutting-edge Machine Learning technologies, particularly... ...Personas We approach our distributed world of work with flexibility...Work at officeRemote workFlexible hours
$195k - $230k
...visit About the Role We are looking for a Senior Machine Learning Engineer to help evolve our large-scale recommendation systems... ...working with large-scale data and ML systems (e.g., Spark, distributed training, real-time inference). - Ability to own features...Full timeLocal areaWork from home- ...workflows, and continuously learn and adapt. Moveworks is trusted... ...with Moveworks' Reasoning Engine and natural language... ...Role We are looking for a Machine Learning Engineer to help build... ...of responsibilities including distributed training and inference pipeline...Work at officeRemote workFlexible hours
- ...workflows, and continuously learn and adapt. Moveworks is trusted... ...with Moveworks' Reasoning Engine and natural language... ...experienced software engineer with machine learning expertise to join us... ...Personas We approach our distributed world of work with flexibility...Work at officeImmediate startRemote workFlexible hours
$214k - $289.5k
...Overview Come join Intuit as a Senior Staff Machine Learning Engineer (MLE). Senior Staff MLEs deliver end-to-end AI solutions that span... ...~ Deep hands-on experience with modern ML frameworks and distributed systems (TensorFlow, PyTorch, Spark, Ray, Kubernetes, MLflow...Local area$238k - $302k
...Waymo AI Foundations team is to develop machine learning solutions addressing open problems in... ...report to a Senior Staff Software Engineer. You will: * Work with a creative... ...record * Experience with large scale distributed system * Proficient programming...Full timeTemporary workRemote work- ...volume, real-time, multi-modal machine-generated data — including... ...Splunk and Cisco's global engineering capabilities. Our work spans... ...and unstructured data, deep learning-based time series modeling,... ...optimizing model architectures, distributed training pipelines, and...Flexible hours
$160k - $200k
...As a Senior ML Infrastructure Engineer at Plus, you will design... ...integrated with state-of-the-art deep learning frameworks like PyTorch or... ...of what's possible in machine learning infrastructure and contribute... ...usability. Implement distributed systems and storage solutions...$170k - $216k
...Machine Learning Engineer (Infra), Driver Understanding and Evaluation Waymo is an autonomous driving technology company with the mission... ...cumulatively travel millions of miles. Design and scale large distributed systems covering the ML lifecycle, supporting planet-scale...Full time- ...product excels in using cutting-edge Machine Learning technologies, particularly Generative... ...critical tasks. As a conversation product engineer, you'll apply these technologies to... ...Information Work Personas We approach our distributed world of work with flexibility and...Work at officeRemote workFlexible hours
$92k - $138k
...supporting analytics, product intelligence, machine learning pipelines, and business operations. As... ...We’re looking for a Machine Learning Engineer to join our Offline Infrastructure... ...data generation, ML workflows, and distributed model training. Working closely with experienced...Work at officeWorldwideRelocation package$170k - $240k
...and model development initiatives. As a Senior ML Engineer, you will collaborate closely with machine learning engineers, research scientists, and other... ...performance analysis and optimization solutions to scale distributed training workflows and maximize resource...Local areaRemote workWork from homeRelocationRelocation packageFlexible hours$150k
...class researchers, data scientists, and engineers, tackling the most fundamental and impactful... ...for high-performance computing in deep learning, driving impactful discoveries that... ...generation of AI pioneers. The Role As a Machine Learning Engineer at the Institute of...WorldwideVisa sponsorship- ...workflows, and continuously learn and adapt. Moveworks is trusted... ...with Moveworks' Reasoning Engine and natural language... ...Use the latest advances in machine learning, LLMs, and AI agents... ...Personas We approach our distributed world of work with flexibility...Work at officeImmediate startRemote workFlexible hours
$2,000 per month
...Machine Learning Research Engineer Cupertino, CA Etched is building AI chips that are hard-coded for individual model architectures. Our first... ...(vLLM, SGLang, etc.) and/or experience working in distributed inference/training environments Experience working cross...Work at officeRelocation package$244.14k - $413.16k
...Senior Staff Machine Learning Engineer - Foundation Model Santa Clara, CA XPENG is a leading smart technology company at the forefront of... ...engineers to scale training across thousands of GPUs using distributed training frameworks (FSDP, DDP, etc.). Conduct systematic...Full time- ...allowing autonomous devices like vehicles and robots to make more intelligent and safe decisions. Role Overview: As an ML Engineer on our perception team, you will own the development and deployment of 3D perception models across object detection, semantic...Flexible hours
Do you want to receive more vacancies?
Subscribe and receive similar vacancies to Distributed Machine Learning Engineer. Be the first to apply!
- machine learning software engineer Sunnyvale, CA
- ai ml engineer Sunnyvale, CA
- computer vision machine learning engineer Sunnyvale, CA
- machine learning engineer Sunnyvale, CA
- senior ml engineer Sunnyvale, CA
- machine learning ai engineer Sunnyvale, CA
- data scientist machine learning engineer Sunnyvale, CA
- machine learning intern Sunnyvale, CA
- machine learning Sunnyvale, CA
- artificial intelligence - machine learning intern Sunnyvale, CA



