Sign up to access all features of our service.
  • Job search
  • Favorites
  • Create a CV
    New
  • Salaries
  • Subscriptions

Distributed Machine Learning Engineer

$150k

Institute of Foundation Models

About the Institute of Foundation Models

We are a dedicated research lab for building, understanding, using, and risk-managing foundation models. Our mandate is to advance research, nurture the next generation of AI builders, and drive transformative contributions to a knowledge-driven economy.

As part of our team, you'll have the opportunity to work on the core of cutting-edge foundation model training, alongside world-class researchers, data scientists, and engineers, tackling the most fundamental and impactful challenges in AI development. You will participate in the development of groundbreaking AI solutions that have the potential to reshape entire industries. Strategic and innovative problem-solving skills will be instrumental in establishing MBZUAI as a global hub for high-performance computing in deep learning, driving impactful discoveries that inspire the next generation of AI pioneers.

The Role

The Distributed ML Engineer will play a role at the forefront of optimizing performance for the machine learning software stacks, especially at training and inference, and support the team to develop new and cutting-edge systems. The ideal candidate will have a strong background in parallel computing, and hands-on experience in system level coding, debug methodologies, and large-scale machine learning experience.

Key Responsibilities

  • Understand, analyze, profile, optimize, and provide guidance to the team on deep learning workloads on state-of-the-art hardware and software platforms to improve their efficiency with different levels of optimization
  • Design and implement performance benchmarks and testing methodologies to evaluate application performance
  • Build tools to automate workload analysis, workload optimization, and other critical workflows
  • Triage system issues and identify bottleneck and inefficiencies by analyzing the sources of issues and the impact on hardware, network and propose solutions to enhance GPU utilization
  • Support the team to develop appropriate kernels and systems for new model architectures and algorithms
  • Participate in, or lead design reviews with peers and stakeholders to decide amongst available technologies.
  • Review code developed by other developers and provide feedback to ensure best practices (e.g., style guidelines, checking code in, accuracy, testability, and efficiency).
  • Contribute to existing documentation or educational content and adapt content based on product/program updates and user feedback.
  • Represent MBZUAI at industry conferences and events, showcasing the institution's cutting-edge HPC and deep learning capabilities and establishing MBZUAI as a global leader in AI research and innovation.
  • Perform all other duties as reasonably directed by the line manager that are commensurate with these functional objectives.
Academic Qualifications
  • Ph.D. in CS, EE or CSEE with 1+ years working experience, OR
  • Masters in CS, EE or CSEE or equivalent experience with 2+ year working experience

$150,000 - $450,000 a year

Visa Sponsorship

This position is eligible for visa sponsorship.

Benefits Include

*Comprehensive medical, dental, and vision benefits

*Bonus

*401K Plan

*Generous paid time off, sick leave and holidays

*Paid Parental Leave

*Employee Assistance Program

*Life insurance and disability
Vacancy posted 2 days ago
Similar jobs that could be interesting for youBased on the Distributed Machine Learning Engineer in United States vacancy
  • $136.32k - $287.41k

     ...vLLM to every enterprise. The Red Hat AI Inference Engineering team accelerates AI for the enterprise and brings operational...  ...build, optimize, and scale LLM deployments. As a Machine Learning Engineer focused on distributed vLLM infrastructure in the llm-d project, you will... 
    Suggested
    Permanent employment
    Full time
    Contract work
    Work experience placement
    Work at office
    Remote work
    Flexible hours

    Red Hat

    Boston, MA
    3 days ago
  •  ...Research carries out foundational research on Protocol Learning: multi-participant training of foundation models...  ...sustaining economics. We’re looking for Senior/Staff engineers with 5+ years of experience in distributed systems and ML large‑scale training. You’ll be... 
    Suggested
    Remote work
    Visa sponsorship

    Pluralis Research

    California, MO
    4 days ago
  •  ...Overview We’re looking for a Machine Learning Systems Engineer to strengthen the performance and scalability of our distributed training infrastructure. In this role, you'll work closely with researchers to streamline the development and execution of large-scale training... 
    Suggested

    Susquehanna International Group

    Narberth, PA
    1 day ago
  •  ...As a Research Engineer, Distributed Data Systems, you will design and scale the infrastructure that powers large-scale multimodal training and evaluation at OpenAI. You’ll manage distributed data pipelines, collaborate closely with researchers to translate requirements... 
    Suggested

    OpenAI

    San Francisco, CA
    4 days ago
  • $295k

     ...improve peoples' lives. About the Role As a Research Engineer, Distributed Data Systems, you will design and scale the...  ...orchestration, distributed storage, streaming infrastructure, machine learning infrastructure while ensuring scalability, reliability, and... 
    Suggested
    Work at office
    Relocation package

    OpenAI

    San Francisco, CA
    1 day ago
  • A mission-driven technology company in California is seeking experienced Senior/Staff Engineers proficient in building distributed ML systems. Applicants should possess strong experience in optimizing large-scale training under low-bandwidth conditions, with expertise... 
    Remote work

    Pluralis Research

    California, MO
    4 days ago
  • A leading AI research company in San Francisco seeks Senior/Staff Engineers skilled in distributed systems and large-scale ML training. Responsibilities include designing systems optimized for low-bandwidth conditions and implementing robust training strategies. Ideal... 
    Remote work

    Pluralis Research

    San Francisco, CA
    4 days ago
  • $170k - $200k

     ...transformer-only architectures, combining rigorous engineering with learning systems proven in globally deployed solutions...  ...Systems Engineer to architect and build the distributed infrastructure that powers large-scale machine learning workflows across the organization.... 
    Local area

    FieldAI

    Irvine, CA
    6 hours ago
  • $145.5k - $232.5k

     ...functional team of applied scientists and engineers delivers production-grade AI systems...  ...the role Zillow is seeking a Machine Learning Engineer to join the Applied Reasoning...  ...Contribute to best practices in distributed ML systems, scalable architecture, and... 
    Permanent employment
    Live in
    Work at office
    Local area
    Remote work

    Zillow

    Remote
    12 hours ago
  • $100k

     ...Description Do you have demonstrated machine learning experience and want to apply that experience...  ...team of scientists and engineers? Are you ready to help the US secure...  ...art deep learning techniques to solve distributed resource allocation problems. Have... 
    Temporary work
    Work experience placement
    Interim role
    Relocation package
    Flexible hours

    The Johns Hopkins University Applied Physics Laboratory

    Laurel, MD
    4 days ago
  •  ...Machine Learning Engineer II When our values align, there's no limit to what we can achieve. At Parexel, we all share the same goal - to improve...  ...fundamentals including data structures, algorithms, and distributed systems. Advanced Python experience. Machine... 
    Contract work
    Remote work
    Work from home
    Flexible hours

    PAREXEL

    United States
    8 hours ago
  • $160k - $250k

     ...opportunities where software, computer vision, and machine learning can meaningfully augment or automate complex engineering judgment. As a Senior Machine Learning...  ...or Express frameworks Prior experience with distributed training using cloud infrastructure Prior experience... 
    Permanent employment
    Local area
    Relocation package
    Flexible hours

    Hadrian Automation

    Los Angeles, CA
    9 hours ago
  •  ...Machine Learning Engineer Remote (or Hybrid in Houston TX) | Full-time | Startup Environment Join the Team at Geminus The Machine Learning...  ...services on the Geminus platform using containerized or distributed systems. • Conduct validation and benchmarking against... 
    Full time
    Work at office
    Remote work

    Geminus

    United States
    7 hours ago
  • $150k - $215k

     ...small agile team combining world‑class engineers with veteran strategists who bring...  ...t mean standing still. About the Role Machine learning is core to Vannevar's enrichment capabilities...  ...skills, including experience with distributed systems, APIs, and cloud infrastructure... 
    Permanent employment
    Contract work
    For contractors
    For subcontractor
    Work at office
    Remote work

    Vannevar Labs

    New York, NY
    2 days ago
  •  ...Machine Learning Engineer At CloudWalk, our Security team doesn't just react to threats; we engineer systems that anticipate and neutralize...  ...with LLMs and Agents. As a member of a fully remote and distributed team, you are expected to complete tasks autonomously,... 
    Remote work

    CloudWalk

    United States
    5 days ago
  • $160k - $230k

     ...skills and experience — talk with your recruiter to learn more. Base pay range $160,000.00/yr - $230,000.00/yr...  ...Recruiter | C++ · Rust · Core Linux · Low Latency · Network Engineering AI/ML Solutions Architect – Distributed Training & GPU Infrastructure Location: Remote from... 
    Full time
    Remote work

    Doghouse Recruitment

    New York, NY
    2 days ago
  •  ...Description Tyto Athene is seeking a driven and adaptable Machine Learning Engineer to help shape the future of cybersecurity through...  ...collaboration skills ~ Ability to work independently with distributed teams in a fast-paced, agile environment ~ Eagerness... 
    Remote work
    Worldwide

    Tyto Athene, LLC

    United States
    4 days ago
  •  ...Our client is looking for machine learning engineers to develop and implement machine learning models and algorithms to drive actionable insights...  ...stakeholders. Preferred Qualifications Experience with distributed computing frameworks such as Apache Spark or Hadoop.... 

    Victor Noble Associates

    New York, NY
    2 days ago
  •  ...Forward Deployed ML Engineer We are looking for a Forward Deployed...  ...~ Adapt Stratum's deep learning models to a given mine. ~ Develop...  ...and maintain high-quality machine learning code using Python....  ...gold, silver, copper, etc. are distributed (and how much!) using only small... 
    Remote work

    StratumAI

    United States
    9 hours ago
  •  ...Machine Learning Engineer Roper Technologies is seeking a Machine Learning Engineer to help design, build, and deploy advanced AI systems...  ...using AI tools Agent frameworks and orchestration tools Distributed systems or microservices architecture Model monitoring... 
    Remote work

    Roper Technologies

    United States
    21 hours ago
  • $184.15k

     ...Overview Explore top remote machine learning engineer jobs and find flexible roles such as llm engineer, nlp engineer, computer vision engineer...  ...lately, offering a snapshot of the employers investing in distributed machine learning engineer talent. Top regions hiring... 
    Remote work
    Flexible hours

    Kickstart Remote

    New York, NY
    2 days ago
  •  ...multidisciplinary organization, with work spanning distributed systems on AWS to geospatial...  ...As AI becomes embedded in modern engineering workflows, we value engineers who can...  ...engineering workflow. What You'll Do As a Machine Learning Engineer III on the Routing Cost team,... 
    Immediate start
    Remote work

    Mapbox

    United States
    6 hours ago
  •  ...AI presents: mass-manufactured social engineering. Countless scams, deepfakes, and other...  ...digital threats. We're looking for a machine learning engineer to help build and scale the...  ...working with large-scale datasets and distributed data processing frameworks Understand... 
    Work at office
    Remote work
    Flexible hours

    Doppel

    United States
    4 days ago
  •  ...ML Engineer We are looking for a ML Engineer to work closely with the ML Architect to...  ...develop on ML frameworks (TensorFlow, Scikit-Learn, Pytorch), Experimentation platform and...  ...to: Develop large-scale distributed machine learning systems that are scalable, performant... 
    Remote work

    Mindera

    United States
    5 days ago
  •  ...Machine Learning Engineer Eli Health is making continuous hormone monitoring possible, enabling users to support their daily and long-term...  ...our office and R&D facilities are in Montreal, we have a distributed team. We prioritise asynchronous workflows and minimise meetings... 
    Work at office
    Remote work

    ELI Inc

    United States
    3 days ago
  •  ...across the US and Europe — manufacturing, distribution, high-tech. Our teams build and...  ...Responsibilities Designing, building, and optimizing machine learning models for production use, with a...  ...‑on experience in machine learning engineering Strong proficiency in Python and core... 
    Remote work

    VARTEQ, Inc.

    New York, NY
    2 days ago
  •  ...Chicagoland Area preferred Who We Are: K1x is the leading data distribution platform for alternative investments. Simply put, our...  ...role: We are seeking a highly skilled and experienced Machine Learning Engineer to join our dynamic team. The ideal candidate will have a... 
    Remote work

    K1X

    New York, NY
    2 days ago
  •  ...energy on serving their customers and communities. As a Machine Learning Engineer, you will help build and operate production systems that...  ...Experience building APIs, backend services, or working with distributed systems Familiarity with cloud platforms such as AWS,... 

    ExtendMyTeam

    Austin, TX
    13 days ago
  •  ...About the job Machine Learning Engineer Glint Tech Solutions is Hiring an experienced Machine Learning Engineer to join our client's...  ...Hands-on experience with Kubernetes and container orchestration Strong understanding of scalability and distributed systems

    Glint Tech Solutions LLC

    Sunnyvale, CA
    4 days ago
  •  ...and backed by top-tier VCs including Y Combinator. As a Machine Learning Engineer, you'll work alongside our founders and team members to...  ...complexities of multi-scale biological systems You will work on distributed training systems to scale our models to billions of... 

    Output Services

    New York, NY
    3 days ago

Do you want to receive more vacancies?

Subscribe and receive similar vacancies to Distributed Machine Learning Engineer. Be the first to apply!