Distributed Machine Learning Engineer
$150kInstitute of Foundation Models
About the Institute of Foundation Models We are a dedicated research lab for building, understanding, using, and risk-managing foundation models. Our mandate is to advance research, nurture the next generation of AI builders, and drive transformative contributions to a knowledge-driven economy. As part of our team, you’ll have the opportunity to work on the core of cutting‑edge foundation model training, alongside world‑class researchers, data scientists, and engineers, tackling the most fundamental and impactful challenges in AI development. You will participate in the development of groundbreaking AI solutions that have the potential to reshape entire industries. Strategic and innovative problem‑solving skills will be instrumental in establishing MBZUAI as a global hub for high‑performance computing in deep learning, driving impactful discoveries that inspire the next generation of AI pioneers. The Role The Distributed ML Engineer will play a role at the forefront of optimizing performance for the machine learning software stacks, especially at training and inference, and support the team to develop new and cutting‑edge systems. The ideal candidate will have a strong background in parallel computing, and hands‑on experience in system level coding, debug methodologies, and large‑scale machine learning experience. Key Responsibilities Understand, analyze, profile, optimize, and provide guidance to the team on deep learning workloads on state‑of‑the‑art hardware and software platforms to improve their efficiency with different levels of optimization Design and implement performance benchmarks and testing methodologies to evaluate application performance Build tools to automate workload analysis, workload optimization, and other critical workflows Triage system issues and identify bottleneck and inefficiencies by analyzing the sources of issues and the impact on hardware, network and propose solutions to enhance GPU utilization Support the team to develop appropriate kernels and systems for new model architectures and algorithms Participate in, or lead design reviews with peers and stakeholders to decide amongst available technologies. Review code developed by other developers and provide feedback to ensure best practices (e.g., style guidelines, checking code in, accuracy, testability, and efficiency). Contribute to existing documentation or educational content and adapt content based on product/program updates and user feedback. Represent MBZUAI at industry conferences and events, showcasing the institution’s cutting‑edge HPC and deep learning capabilities and establishing MBZUAI as a global leader in AI research and innovation. Perform all other duties as reasonably directed by the line manager that are commensurate with the functional objectives. Academic Qualifications Ph.D. in CS, EE or CSEE with 1+ years working experience, OR Masters in CS, EE or CSEE or equivalent experience with 2+ years working experience $150,000 - $450,000 a year Visa Sponsorship This position is eligible for visa sponsorship. Benefits Include Comprehensive medical, dental, and vision benefits Bonus 401K Plan Generous paid time off, sick leave and holidays Paid Parental Leave Employee Assistance Program Life insurance and disability #J-18808-Ljbffr
- An AI lab in Santa Clara is seeking a skilled software engineer with over 8 years of experience to optimize machine learning models for real-time applications. The role involves designing distributed training strategies, collaborating with ML researchers, and developing...Suggested
$150k
...A leading research lab in Sunnyvale is seeking a distributed ML infrastructure engineer to extend and scale training systems. The ideal candidate must have over 5 years of experience in ML systems with strong expertise in distributed training frameworks like DeepSpeed...Suggested- ...Senior ML Engineer Medical Imaging Evaluation & AI Reliability About the Role:... ...Qualifications: Strong experience in machine learning for medical imaging (radiology, pathology... ...of: model robustness, distribution shift, uncertainty, failure analysis...SuggestedShift work
- ...autonomous agents that reason, act, and continuously improve. As a Machine Learning Engineer , you won't just build models, you'll architect the entire... ...the ground up. You'll work at the intersection of LLMs, distributed systems, and real-world applications , owning everything...Suggested
- ...2 Candidates only Position Summary Seeking an experienced Machine Learning Engineer to lead the development of prompt injection and prompt safety... ..., jailbreak, and agentic AI threat models, and with distributed training frameworks (DeepSpeed, FSDP, Accelerate). Preferred...Suggested
- ...Role Description We are seeking an experienced GenAI engineer to join our seasoned founding team to drive the... ...rapid development and iteration of scalable, robust distributed infrastructure to support machine learning training, inference, and evaluation. Hands‑on contributor...
$181.1k - $318.4k
...Machine Learning Compiler Engineer At Apple, we're on the cutting edge of delivering transformative experiences through Artificial Intelligence... ...and functions Experience optimizing compilers for distributed, parallel, or heterogeneous execution environments, with...Relocation$170k - $216k
...Perception Machine Learning Engineer Waymo is an autonomous driving technology company with the mission to be the world's most trusted driver... ...and post-training. Develop methods and recipes for distributed fine-tuning enabling multiple developers to simultaneously...Full timeRemote work- ...About the job Machine Learning Engineer Glint Tech Solutions is Hiring an experienced Machine Learning Engineer to join our client's... ...Hands-on experience with Kubernetes and container orchestration Strong understanding of scalability and distributed systems
$147.4k - $220.9k
...Machine Learning Engineer - LLM Imagine what you could do here. At Apple, new ideas have a way of becoming extraordinary products, services... ...tools, i.e. Claude Code, Roo Code, etc. Familiarity with distributed computing, cloud infrastructure, and orchestration tools,...Relocation- ...a team of passionate AI researchers, engineers, and advertising veterans. Join us in... ...with AI. About this role We’re hiring a Machine Learning Engineer to design and scale advanced... ...level software, compilers, and network distribution software for massive social data and prediction...Full time
$123.75k - $185k
...product managers, UX designers, and other engineers to define requirements and deliver... ...Diagnose and troubleshoot issues in complex distributed environments and optimize system... ...Qualifications Knowledge and passion in machine learning algorithms, Gen AI, LLMs, and natural...Work experience placementWork at office3 days per week$140k - $220k
...feedback and needs. ABOUT THE JOB We are looking for a Machine Learning Engineer to help build and develop our ML capabilities at RADAR.... ...Expertise in big data processing including SQL optimization and distributed computing (Spark/Dask) ~ Production experience with...Work at officeFlexible hours$147.4k - $272.1k
...customer front and center? The Video Engineering group at Apple is responsible for creating... ...all Apple products and services.As a machine learning engineer, you’ll be developing machine... .... Additionally, you’ll strategically distribute computational workloads across CPU,...Relocation$181.1k - $318.4k
...experience where customers can shop, buy and learn everything Apple, wherever they are.... ..., and hands‑on applied Senior Machine Learning Engineer. This role will assist our Online Retail... ..., Java, C++ and experience building distributed systems. Experience building data...Work experience placementRelocation- ...A leading technology company is seeking a Fellow/Sr. Fellow Machine Learning Engineer to join the Training At Scale team in San Jose, CA. The candidate will work on distributed training of large models and improve training efficiency. Responsibilities include enhancing...
- ...product managers, UX designers, and other engineers to define requirements and deliver... ...Diagnose and troubleshoot issues in complex distributed environments and optimize system... ...Qualifications Knowledge and passion in machine learning algorithms, GenAI, LLMs, and Agentic...Work experience placement
$181.1k - $318.4k
...Sr. Machine Learning Engineer, Siri Speech Cupertino, California, United States Machine Learning and AI We are a group of engineers/researchers... ..., training, evaluation, deployment Familiarity with distributed training and large‑scale data pipelines Solid understanding...Relocation$147.4k - $272.1k
...Machine Learning Engineer - Speech & Multimodal Language Modeling Cupertino, California, United States | Machine Learning and AI Apple is where... ...LLMs Experience with large‑scale audio data processing on distributed systems Experience with prompt evaluation and optimization...Relocation$181.1k - $318.4k
...Sunnyvale, California, United States Machine Learning and AI Help define the next generation... ...video experiences at Apple. The Video Engineering group develops key image and video technologies... ...rendering. Experience with distributed training at scale. Familiarity with on...Relocation- ...organizations that keep the world running. Our Team's Vision: Our Engineering team is shaping the future of cybersecurity. We thrive on... ...As a Senior Software Engineer, you will architect high‑scale distributed systems that process massive data volumes to fuel our Agentic...Immediate start
$212k - $318.4k
...Santa Clara, California, United States Machine Learning and AI Are you interested in enhancing the capabilities... ..., including applied machine learning engineers with a focus on ML and LLM, and experienced distributed systems engineers. As such, we are seeking candidates...Work experience placementRelocation$181.1k - $318.4k
...Sr. Machine Learning Engineer - Answers, Knowledge & Information (AKI) Work Locations (2) Submit Resume Siri helps hundreds of millions... ...from applied scientists with a focus in NLP to experienced distributed systems. We are looking for candidates with both applied machine...Local areaRelocation$181.1k - $318.4k
...Senior Machine Learning Engineer - Ads Bidding & Pacing At Apple, we focus deeply on our customers' experience. Apple Ads brings this same... ...Java or Python. ~ Experience with Spark, Hadoop or other distributed frameworks. ~ PhD in Machine Learning, Statistics, Control...Relocation$181.1k - $318.4k
...AIML - Machine Learning Engineer, Foundation Models Apple is revolutionizing artificial intelligence by developing sophisticated foundation... ...such as Jax, PyTorch, TensorFlow Strong background in: Distributed training, Model optimization, and Machine learning infrastructure...Relocation$195k - $230k
...visit About the Role We are looking for a Senior Machine Learning Engineer to help evolve our large-scale recommendation systems... ...working with large-scale data and ML systems (e.g., Spark, distributed training, real-time inference). - Ability to own features...Full timeLocal areaWork from home$148.7k - $223.1k
...Requisition ID: JOBREQ-2615661 Senior Machine Learning Engineer, AI Automation Mountain View, CA, USA, Full-time ALERT: Unity has received reports... ...Go, or Python, with a background in building large-scale distributed systems and real-time data pipelines (Kafka, Flink, or...Full timeFixed term contractWork at officeLocal areaRemote workWorldwideRelocation package- ...We are looking for the best At 42dot, our Machine Learning Engineers conduct research and development on machine learning algorithms to ensure... ...labeled data (low-shot learning), and handling class distribution imbalances (long-tail learning) commonly found in autonomous...
$126k - $181.5k
...Software Engineering Mountain View, California Machine Learning Engineering TL, Behavior Planning Who we are Aurora’s mission is to deliver the benefits... ...training large models on massive datasets using distributed computing. ~ Fluency in Python, with a focus on...Local area$181.1k - $318.4k
...Senior Machine Learning Engineer, AI, SIML Work Locations (2) Submit Resume Are you passionate about Generative AI and excited to work... ...Experience with parallel training libraries such as PyTorch Distributed (torch.distributed), DeepSpeed, or FairScale. Experience...RelocationFlexible hours
Do you want to receive more vacancies?
Subscribe and receive similar vacancies to Distributed Machine Learning Engineer. Be the first to apply!
- machine learning software engineer Sunnyvale, CA
- ai ml engineer Sunnyvale, CA
- computer vision machine learning engineer Sunnyvale, CA
- machine learning engineer Sunnyvale, CA
- senior ml engineer Sunnyvale, CA
- machine learning ai engineer Sunnyvale, CA
- data scientist machine learning engineer Sunnyvale, CA
- machine learning intern Sunnyvale, CA
- machine learning part time Sunnyvale, CA
- machine learning Sunnyvale, CA



