Sign up to access all features of our service.
  • Job search
  • Favorites
  • Create a CV
    New
  • Salaries
  • Subscriptions

Distributed Machine Learning Engineer

$150k

Institute of Foundation Models

About the Institute of Foundation Models We are a dedicated research lab for building, understanding, using, and risk-managing foundation models. Our mandate is to advance research, nurture the next generation of AI builders, and drive transformative contributions to a knowledge-driven economy. As part of our team, you’ll have the opportunity to work on the core of cutting‑edge foundation model training, alongside world‑class researchers, data scientists, and engineers, tackling the most fundamental and impactful challenges in AI development. You will participate in the development of groundbreaking AI solutions that have the potential to reshape entire industries. Strategic and innovative problem‑solving skills will be instrumental in establishing MBZUAI as a global hub for high‑performance computing in deep learning, driving impactful discoveries that inspire the next generation of AI pioneers. The Role The Distributed ML Engineer will play a role at the forefront of optimizing performance for the machine learning software stacks, especially at training and inference, and support the team to develop new and cutting‑edge systems. The ideal candidate will have a strong background in parallel computing, and hands‑on experience in system level coding, debug methodologies, and large‑scale machine learning experience. Key Responsibilities Understand, analyze, profile, optimize, and provide guidance to the team on deep learning workloads on state‑of‑the‑art hardware and software platforms to improve their efficiency with different levels of optimization Design and implement performance benchmarks and testing methodologies to evaluate application performance Build tools to automate workload analysis, workload optimization, and other critical workflows Triage system issues and identify bottleneck and inefficiencies by analyzing the sources of issues and the impact on hardware, network and propose solutions to enhance GPU utilization Support the team to develop appropriate kernels and systems for new model architectures and algorithms Participate in, or lead design reviews with peers and stakeholders to decide amongst available technologies. Review code developed by other developers and provide feedback to ensure best practices (e.g., style guidelines, checking code in, accuracy, testability, and efficiency). Contribute to existing documentation or educational content and adapt content based on product/program updates and user feedback. Represent MBZUAI at industry conferences and events, showcasing the institution’s cutting‑edge HPC and deep learning capabilities and establishing MBZUAI as a global leader in AI research and innovation. Perform all other duties as reasonably directed by the line manager that are commensurate with the functional objectives. Academic Qualifications Ph.D. in CS, EE or CSEE with 1+ years working experience, OR Masters in CS, EE or CSEE or equivalent experience with 2+ years working experience $150,000 - $450,000 a year Visa Sponsorship This position is eligible for visa sponsorship. Benefits Include Comprehensive medical, dental, and vision benefits Bonus 401K Plan Generous paid time off, sick leave and holidays Paid Parental Leave Employee Assistance Program Life insurance and disability #J-18808-Ljbffr

Vacancy posted 10 hours ago
Similar jobs that could be interesting for youBased on the Distributed Machine Learning Engineer in Sunnyvale, CA vacancy
  • An AI lab in Santa Clara is seeking a skilled software engineer with over 8 years of experience to optimize machine learning models for real-time applications. The role involves designing distributed training strategies, collaborating with ML researchers, and developing... 
    Suggested

    Odyssey

    Santa Clara, CA
    2 days ago
  • $150k

     ...A leading research lab in Sunnyvale is seeking a distributed ML infrastructure engineer to extend and scale training systems. The ideal candidate must have over 5 years of experience in ML systems with strong expertise in distributed training frameworks like DeepSpeed... 
    Suggested

    Institute of Foundation Models

    Sunnyvale, CA
    10 hours ago
  •  ...Senior ML Engineer Medical Imaging Evaluation & AI Reliability About the Role:...  ...Qualifications: Strong experience in machine learning for medical imaging (radiology, pathology...  ...of: model robustness, distribution shift, uncertainty, failure analysis... 
    Suggested
    Shift work

    Established Search

    Sunnyvale, CA
    2 days ago
  •  ...autonomous agents that reason, act, and continuously improve. As a Machine Learning Engineer , you won't just build models, you'll architect the entire...  ...the ground up. You'll work at the intersection of LLMs, distributed systems, and real-world applications , owning everything... 
    Suggested

    Barker Staffing Solutions LLC

    Mountain View, CA
    11 hours ago
  •  ...2 Candidates only Position Summary Seeking an experienced Machine Learning Engineer to lead the development of prompt injection and prompt safety...  ..., jailbreak, and agentic AI threat models, and with distributed training frameworks (DeepSpeed, FSDP, Accelerate). Preferred... 
    Suggested

    The Fountain Group

    Mountain View, CA
    11 hours ago
  •  ...Role Description We are seeking an experienced GenAI engineer to join our seasoned founding team to drive the...  ...rapid development and iteration of scalable, robust distributed infrastructure to support machine learning training, inference, and evaluation. Hands‑on contributor... 

    Spector.ai

    Mountain View, CA
    11 hours ago
  • $181.1k - $318.4k

     ...Machine Learning Compiler Engineer At Apple, we're on the cutting edge of delivering transformative experiences through Artificial Intelligence...  ...and functions Experience optimizing compilers for distributed, parallel, or heterogeneous execution environments, with... 
    Relocation

    Apple

    Sunnyvale, CA
    3 days ago
  • $170k - $216k

     ...Perception Machine Learning Engineer Waymo is an autonomous driving technology company with the mission to be the world's most trusted driver...  ...and post-training. Develop methods and recipes for distributed fine-tuning enabling multiple developers to simultaneously... 
    Full time
    Remote work

    Waymo

    Mountain View, CA
    1 day ago
  •  ...About the job Machine Learning Engineer Glint Tech Solutions is Hiring an experienced Machine Learning Engineer to join our client's...  ...Hands-on experience with Kubernetes and container orchestration Strong understanding of scalability and distributed systems

    Glint Tech Solutions LLC

    Sunnyvale, CA
    1 day ago
  • $147.4k - $220.9k

     ...Machine Learning Engineer - LLM Imagine what you could do here. At Apple, new ideas have a way of becoming extraordinary products, services...  ...tools, i.e. Claude Code, Roo Code, etc. Familiarity with distributed computing, cloud infrastructure, and orchestration tools,... 
    Relocation

    Apple

    Cupertino, CA
    2 days ago
  •  ...a team of passionate AI researchers, engineers, and advertising veterans. Join us in...  ...with AI. About this role We’re hiring a Machine Learning Engineer to design and scale advanced...  ...level software, compilers, and network distribution software for massive social data and prediction... 
    Full time

    Embedding VC

    Mountain View, CA
    11 hours ago
  • $123.75k - $185k

     ...product managers, UX designers, and other engineers to define requirements and deliver...  ...Diagnose and troubleshoot issues in complex distributed environments and optimize system...  ...Qualifications Knowledge and passion in machine learning algorithms, Gen AI, LLMs, and natural... 
    Work experience placement
    Work at office
    3 days per week

    Nutanix

    Santa Clara, CA
    11 hours ago
  • $140k - $220k

     ...feedback and needs. ABOUT THE JOB We are looking for a Machine Learning Engineer to help build and develop our ML capabilities at RADAR....  ...Expertise in big data processing including SQL optimization and distributed computing (Spark/Dask) ~ Production experience with... 
    Work at office
    Flexible hours

    RADAR

    Sunnyvale, CA
    18 days ago
  • $147.4k - $272.1k

     ...customer front and center? The Video Engineering group at Apple is responsible for creating...  ...all Apple products and services.As a machine learning engineer, you’ll be developing machine...  .... Additionally, you’ll strategically distribute computational workloads across CPU,... 
    Relocation

    Apple

    Cupertino, CA
    10 hours ago
  • $181.1k - $318.4k

     ...experience where customers can shop, buy and learn everything Apple, wherever they are....  ..., and hands‑on applied Senior Machine Learning Engineer. This role will assist our Online Retail...  ..., Java, C++ and experience building distributed systems. Experience building data... 
    Work experience placement
    Relocation

    Apple

    Cupertino, CA
    3 days ago
  •  ...A leading technology company is seeking a Fellow/Sr. Fellow Machine Learning Engineer to join the Training At Scale team in San Jose, CA. The candidate will work on distributed training of large models and improve training efficiency. Responsibilities include enhancing... 

    Advanced Micro Devices , Inc.

    San Jose, CA
    11 hours ago
  •  ...product managers, UX designers, and other engineers to define requirements and deliver...  ...Diagnose and troubleshoot issues in complex distributed environments and optimize system...  ...Qualifications Knowledge and passion in machine learning algorithms, GenAI, LLMs, and Agentic... 
    Work experience placement

    Nutanix

    Santa Clara, CA
    11 hours ago
  • $181.1k - $318.4k

     ...Sr. Machine Learning Engineer, Siri Speech Cupertino, California, United States Machine Learning and AI We are a group of engineers/researchers...  ..., training, evaluation, deployment Familiarity with distributed training and large‑scale data pipelines Solid understanding... 
    Relocation

    Apple

    Cupertino, CA
    11 hours ago
  • $147.4k - $272.1k

     ...Machine Learning Engineer - Speech & Multimodal Language Modeling Cupertino, California, United States | Machine Learning and AI Apple is where...  ...LLMs Experience with large‑scale audio data processing on distributed systems Experience with prompt evaluation and optimization... 
    Relocation

    Apple

    Cupertino, CA
    11 hours ago
  • $181.1k - $318.4k

     ...Sunnyvale, California, United States Machine Learning and AI Help define the next generation...  ...video experiences at Apple. The Video Engineering group develops key image and video technologies...  ...rendering. Experience with distributed training at scale. Familiarity with on... 
    Relocation

    Apple

    Sunnyvale, CA
    11 hours ago
  •  ...organizations that keep the world running. Our Team's Vision: Our Engineering team is shaping the future of cybersecurity. We thrive on...  ...As a Senior Software Engineer, you will architect high‑scale distributed systems that process massive data volumes to fuel our Agentic... 
    Immediate start

    Illumio

    Sunnyvale, CA
    11 hours ago
  • $212k - $318.4k

     ...Santa Clara, California, United States Machine Learning and AI Are you interested in enhancing the capabilities...  ..., including applied machine learning engineers with a focus on ML and LLM, and experienced distributed systems engineers. As such, we are seeking candidates... 
    Work experience placement
    Relocation

    Apple

    Santa Clara, CA
    11 hours ago
  • $181.1k - $318.4k

     ...Sr. Machine Learning Engineer - Answers, Knowledge & Information (AKI) Work Locations (2) Submit Resume Siri helps hundreds of millions...  ...from applied scientists with a focus in NLP to experienced distributed systems. We are looking for candidates with both applied machine... 
    Local area
    Relocation

    Apple

    Santa Clara, CA
    5 days ago
  • $181.1k - $318.4k

     ...Senior Machine Learning Engineer - Ads Bidding & Pacing At Apple, we focus deeply on our customers' experience. Apple Ads brings this same...  ...Java or Python. ~ Experience with Spark, Hadoop or other distributed frameworks. ~ PhD in Machine Learning, Statistics, Control... 
    Relocation

    Apple

    Cupertino, CA
    2 days ago
  • $181.1k - $318.4k

     ...AIML - Machine Learning Engineer, Foundation Models Apple is revolutionizing artificial intelligence by developing sophisticated foundation...  ...such as Jax, PyTorch, TensorFlow Strong background in: Distributed training, Model optimization, and Machine learning infrastructure... 
    Relocation

    Apple

    Cupertino, CA
    5 days ago
  • $195k - $230k

     ...visit About the Role We are looking for a Senior Machine Learning Engineer to help evolve our large-scale recommendation systems...  ...working with large-scale data and ML systems (e.g., Spark, distributed training, real-time inference). - Ability to own features... 
    Full time
    Local area
    Work from home

    NewsBreak

    Mountain View, CA
    2 days ago
  • $148.7k - $223.1k

     ...Requisition ID: JOBREQ-2615661 Senior Machine Learning Engineer, AI Automation Mountain View, CA, USA, Full-time ALERT: Unity has received reports...  ...Go, or Python, with a background in building large-scale distributed systems and real-time data pipelines (Kafka, Flink, or... 
    Full time
    Fixed term contract
    Work at office
    Local area
    Remote work
    Worldwide
    Relocation package

    Israelvcforum

    Mountain View, CA
    11 hours ago
  •  ...We are looking for the best At 42dot, our Machine Learning Engineers conduct research and development on machine learning algorithms to ensure...  ...labeled data (low-shot learning), and handling class distribution imbalances (long-tail learning) commonly found in autonomous... 

    42dot

    Mountain View, CA
    1 day ago
  • $126k - $181.5k

     ...Software Engineering Mountain View, California Machine Learning Engineering TL, Behavior Planning Who we are Aurora’s mission is to deliver the benefits...  ...training large models on massive datasets using distributed computing. ~ Fluency in Python, with a focus on... 
    Local area

    Australian Competition and Consumer Commission

    Mountain View, CA
    11 hours ago
  • $181.1k - $318.4k

     ...Senior Machine Learning Engineer, AI, SIML Work Locations (2) Submit Resume Are you passionate about Generative AI and excited to work...  ...Experience with parallel training libraries such as PyTorch Distributed (torch.distributed), DeepSpeed, or FairScale. Experience... 
    Relocation
    Flexible hours

    Apple

    Cupertino, CA
    3 days ago

Do you want to receive more vacancies?

Subscribe and receive similar vacancies to Distributed Machine Learning Engineer. Be the first to apply!