Sign up to access all features of our service.
  • Job search
  • Favorites
  • Create a CV
    New
  • Salaries
  • Subscriptions

Machine Learning Engineer - Distributed ML Systems

Pluralis Research

Senior/Staff Engineer

Pluralis Research carries out foundational research on Protocol Learning: multi-participant training of foundation models where no single participant has, or can ever obtain, a full copy of the model. The purpose of Protocol Learning is to facilitate the creation of community-trained and community-owned frontier models with self-sustaining economics.

We're looking for Senior/Staff engineers with 5+ years of experience in distributed systems and ML large-scale training. You'll be implementing a novel substrate for training distributed ML models that work under consumer grade internet connection.

Distributed Training Architecture & Optimization
  • Design and implement large-scale distributed training systems optimized for heterogeneous hardware operating under low-bandwidth, high-latency conditions.

  • Develop and optimize model-parallel training strategies (data, tensor, pipeline parallelism) with custom sharding techniques that minimize communication overhead.

  • Optimize GPU utilization, memory efficiency, and compute performance across distributed nodes.

  • Implement robust checkpointing, state synchronization, and recovery mechanisms for long-running, fault-prone training jobs.

  • Build monitoring and metrics systems to track training progress, model quality, and system bottlenecks.

Decentralized Networking & Resilience
  • Architect resilient training systems where nodes can fail, networks can partition, and participants can dynamically join or leave.

  • Design and optimize peer-to-peer topologies for decentralized coordination across non-co-located nodes.

  • Implement NAT traversal, peer discovery, dynamic routing, and connection lifecycle management.

  • Profile and optimize communication patterns to reduce latency and bandwidth overhead in multi-participant environments.

What You'll Bring
  • Strong experience building and operating distributed systems in production.

  • Hands-on expertise with distributed training frameworks (FSDP, DeepSpeed, Megatron, or similar).

  • Deep understanding of model parallelism (data, tensor, pipeline parallelism).

  • Expert-level Python with production experience (concurrency, error handling, retry logic, clean architecture).

  • Strong networking fundamentals: P2P systems, gRPC, routing, NAT traversal, distributed coordination.

  • Experience optimizing GPU workloads, memory management, and large-scale compute efficiency.

What We Offer
  • Equity-heavy compensation with meaningful ownership in a mission-driven company

  • Competitive base salary for senior engineering roles in Australia

  • Visa sponsorship available for exceptional candidates

  • Remote-first with optional access to our Melbourne hub

  • World-class team — team mates were previously at Google, Amazon, Microsoft, and leading startups

Backed by Union Square Ventures and other tier-1 investors, we're a world-class, deeply technical team of ML researchers and engineers. Pluralis is unapologetically ideological. We view the world as a better place if we are able to implement what we are attempting, and Protocol Learning as the only plausible approach to preventing a handful of massive corporations monopolizing model development, access and release, and achieving massive economic capture. If this resonates, please apply.

Vacancy posted 15 hours ago
Similar jobs that could be interesting for youBased on the Machine Learning Engineer - Distributed ML Systems in United States vacancy
  •  ...Research carries out foundational research on Protocol Learning: multi-participant training of foundation models...  ...economics. We’re looking for Senior/Staff engineers with 5+ years of experience in distributed systems and ML large‑scale training. You’ll be implementing a... 
    Suggested
    Remote work
    Visa sponsorship

    Pluralis Research

    California, MO
    4 days ago
  •  ...Overview We’re looking for a Machine Learning Systems Engineer to strengthen the performance and scalability of our distributed training infrastructure. In this role, you'll work...  ...looking for Experience with large-scale ML training pipelines and distributed training... 
    Suggested

    Susquehanna International Group

    Narberth, PA
    1 day ago
  • A leading AI research company in San Francisco seeks Senior/Staff Engineers skilled in distributed systems and large-scale ML training. Responsibilities include designing systems optimized for low-bandwidth conditions and implementing robust training strategies. Ideal candidates... 
    Suggested
    Remote work

    Pluralis Research

    San Francisco, CA
    4 days ago
  • A mission-driven technology company in California is seeking experienced Senior/Staff Engineers proficient in building distributed ML systems. Applicants should possess strong experience in optimizing large-scale training under low-bandwidth conditions, with expertise in... 
    Suggested
    Remote work

    Pluralis Research

    California, MO
    4 days ago
  •  ...As a Research Engineer, Distributed Data Systems, you will design and scale the infrastructure that powers large-scale multimodal training and evaluation at OpenAI. You’ll manage distributed data pipelines, collaborate closely with researchers to translate requirements... 
    Suggested

    OpenAI

    San Francisco, CA
    4 days ago
  •  ...capabilities with the constraints of physical systems to improve peoples' lives. About the Role As a Research Engineer, Distributed Data Systems, you will design and scale...  ...storage, streaming infrastructure, machine learning infrastructure while ensuring... 
    Work at office
    Remote work
    Relocation package

    OpenAI

    United States
    2 days ago
  • $170k - $200k

     ...reliable, field-ready AI systems that solve the...  ..., combining rigorous engineering with learning systems proven in globally...  ...a Senior / Staff ML Systems Engineer to architect...  ...and build the distributed infrastructure that powers large-scale machine learning workflows across... 
    Local area

    FieldAI

    Irvine, CA
    13 hours ago
  •  ...Francisco is looking for a Senior Software Engineer to build scalable infrastructure for...  ...of foundation models. You will design distributed training systems and optimize GPU utilization while...  ...candidates have over 5 years of experience in ML infrastructure and a strong background... 

    Baseten

    San Francisco, CA
    4 days ago
  •  ...California seeks a Member of Technical Staff — Training to design and optimize large-scale distributed training systems for frontier AI models. Candidates should have 5+ years of experience in ML systems and be proficient in Python along with another systems language, such as... 

    RadixArk

    Palo Alto, CA
    3 days ago
  • $136.32k - $287.41k

     ...The Red Hat AI Inference Engineering team accelerates AI for...  ...LLM deployments. As a Machine Learning Engineer focused on distributed vLLM infrastructure in...  ...challenges in scalable inference systems and Kubernetes-native...  ...engagement in the ML research community (publications... 
    Permanent employment
    Full time
    Contract work
    Work experience placement
    Work at office
    Remote work
    Flexible hours

    Red Hat

    Boston, MA
    3 days ago
  • $150k

     ...data scientists, and engineers, tackling the most fundamental...  ...computing in deep learning, driving impactful...  ...pioneers. The Role The Distributed ML Engineer will play a...  ...performance for the machine learning software...  ...new and cutting-edge systems. The ideal candidate will... 
    Work experience placement
    Remote work
    Visa sponsorship

    Institute of Foundation Models

    United States
    2 days ago
  • $144k - $192k

     ...Mission Summary: We are looking for a Machine Learning Systems Engineer to join our ML Acceleration team. In this role, you will be responsible for...  ...will directly impact our ability to scale large-scale distributed model training and reduce the time-to-convergence... 
    Work at office
    Remote work

    Motional

    United States
    2 days ago
  •  ...EvenUp Machine Learning Engineer EvenUp is on a mission to close the justice...  ...accessibility to the legal system. Tackling the most complex...  .... Working alongside senior ML engineers, data scientists,...  ...engineering skills (Python, distributed computing, APIs). Strong... 
    Full time
    Temporary work
    Local area
    Remote work
    Home office
    Flexible hours

    EvenUp Inc.

    United States
    3 days ago
  • $320k - $405k

     ...Machine Learning Systems Engineer, Research Tools San Francisco, CA | New York City, NY | Seattle, WA...  ...learning systems, data pipelines, or ML infrastructure Are proficient in...  ...directly enables scientific progress Distributed systems and parallel computing for... 
    Work at office
    Visa sponsorship
    Flexible hours

    Anthropic

    Seattle, WA
    1 day ago
  • $156.74k - $261.23k

     ...over 100 countries. Learn more about what we...  ...across complex engineering workflows. Built on...  ...enabling agentic systems to explore design...  ...intersection of machine learning, data engineering...  ...Design and train ML models that...  ..., algorithms, and distributed systems) and their... 
    Flexible hours
    Shift work

    Keysight Technologies

    Calabasas, CA
    4 days ago
  •  ...of their business systems through natural language...  ...and continuously learn and adapt....  ...Moveworks' Reasoning Engine and natural language...  ...are looking for a Machine Learning Engineer...  ...cutting edge ML infrastructure for...  ...responsibilities including distributed training and... 
    Work at office
    Remote work
    Flexible hours

    ServiceNow

    Mountain View, CA
    5 days ago
  • $144k - $192k

     ...Machine Learning Systems Engineer Boston, MA We are looking for a Machine Learning Systems Engineer to join our ML Acceleration team. In this role, you will be responsible for the core...  ...our ability to scale large-scale distributed model training and reduce the time... 
    Work at office
    Remote work

    Venturefizz Product Management Community

    Boston, MA
    1 day ago
  • $195.78k - $242.1k

     ...to explore, create, play, learn, and connect with friends in...  ...everyone. Recommendation Systems are a key growth lever at...  ...techniques- with large-scale engineering to bridge experimentation...  ...Notifications: owns the distributed systems and ML platform that transform billions... 
    Full time
    Work experience placement
    H1b
    Work at office
    Local area
    Visa sponsorship
    Monday to Friday

    Roblox

    San Mateo, CA
    2 days ago
  • $216.7k - $303.4k

     ...Senior Machine Learning Systems Engineer Remote - United States Reddit is a community of communities. It’...  ...Learning teams. What You’ll Do As a Senior ML Infrastructure Engineer, you will...  ...and GPU training costs in a large, distributed ML training environment. Optimize... 
    Remote work

    Reddit

    Richmond, VA
    2 days ago
  •  ...interpretable, and steerable AI systems. We want AI to be safe...  ...researchers, engineers, policy experts, and...  ...work at the frontier of machine learning, implementing and...  ...and steerable AI. As an ML Systems Engineer on our...  ...performance, large scale distributed systems Large scale... 
    Work at office
    Visa sponsorship
    Flexible hours

    Anthropic

    San Francisco, CA
    5 days ago
  •  ...of their business systems through natural language...  ...and continuously learn and adapt....  ...Moveworks' Reasoning Engine and natural language...  ...engineer with machine learning expertise...  ..., and keeping our ML at the cutting edge...  ...We approach our distributed world of work with... 
    Work at office
    Immediate start
    Remote work
    Flexible hours

    ServiceNow

    Mountain View, CA
    1 day ago
  • $266k - $372.4k

     ...responsible for the end-to-end systems that power personalization...  ...Do As a Senior Staff Machine Learning Systems Engineer, you will help define and...  .... We are looking for a distributed systems expert to own the...  ...features without needing deep ML expertise. Build... 
    For contractors
    Work experience placement
    Remote work
    Flexible hours
    Shift work

    Reddit

    United States
    2 days ago
  •  ...tools for making buildings,machines, and even the latest movies...  ...Autodesk is looking for an ML Engineer, ML Systems and Infrastructure to help...  ...behind large-scale machine learning systems. In this role, you...  ...friendly, with team members distributed across the US and Canada.... 
    For contractors
    Remote work

    Autodesk

    United States
    5 days ago
  • $164k - $313.3k

     ...Opportunity Photoshop ART is seeking a Senior Machine Learning (ML) Systems & Efficiency Engineer to join our R&D team focused on delivering practical...  ...will be given to candidates with experience in distributed inference, multimodal model profiling, and performance... 
    Temporary work
    Local area
    Worldwide

    Adobe

    Seattle, WA
    9 hours ago
  •  ...Distributed Systems Software Engineer, Python / GoJoin to apply for the Distributed Systems Software Engineer...  ...capabilities to new clouds and developing AI/ML pipelines for automatic analysis of...  ...remotely since 2004!Personal learning and development budget of USD 2,000... 
    Local area
    Remote work
    Worldwide

    Canonical

    San Jose, CA
    3 days ago
  • $55 per hour

     ...Distributed Systems Software Engineer, Python / Go Join to apply for the Distributed Systems Software Engineer...  ...capabilities to new clouds and developing AI/ML pipelines for automatic analysis of...  ...remotely since 2004! Personal learning and development budget of USD 2,000... 
    Full time
    Local area
    Remote work
    Worldwide

    Canonical

    Portland, ME
    4 hours ago
  • $97.5k - $199.5k

     ...Python code to support AI and machine learning initiatives. This role...  ...experience designing and building distributed systems and large-scale...  ...development environment AI/ML experience: Proven track...  ...technical guidance to junior engineers and peers Drive adoption... 
    Temporary work
    Flexible hours

    Oracle

    Saint Paul, MN
    2 days ago
  • $153k - $207k

     ...Altos Altos Labs is seeking a Machine Learning Engineer I/II who can accelerate and...  ...to our Computational Systems Modeling & Scaling team, working...  ...in a team of experienced ML and infrastructure engineers...  ...pipelines, feature stores, distributed training, architecture... 
    Contract work
    Local area

    Second Renaissance

    San Diego, CA
    5 days ago
  •  ...world-class scientists, ML researchers, and engineers to work together to move...  ...level something that can be learned, predicted, and designed...  ...to deployment on distributed infrastructure. We are a...  ...at the intersection of machine learning systems architecture and distributed... 
    Work at office

    Achira

    New York, NY
    4 days ago
  • An innovative firm is seeking a passionate software engineer to join their distributed systems team. This role offers the chance to develop resilient cloud...  ...applications and contribute to cutting-edge initiatives in AI/ML and CI pipelines. As part of a globally distributed team... 
    Remote work

    Canonical

    Charlotte, NC
    4 days ago

Do you want to receive more vacancies?

Subscribe and receive similar vacancies to Machine Learning Engineer - Distributed ML Systems. Be the first to apply!