Machine Learning Engineer - Distributed ML Systems
Pluralis Research
Senior/Staff Engineer
Pluralis Research carries out foundational research on Protocol Learning: multi-participant training of foundation models where no single participant has, or can ever obtain, a full copy of the model. The purpose of Protocol Learning is to facilitate the creation of community-trained and community-owned frontier models with self-sustaining economics.
We're looking for Senior/Staff engineers with 5+ years of experience in distributed systems and ML large-scale training. You'll be implementing a novel substrate for training distributed ML models that work under consumer grade internet connection.
Distributed Training Architecture & Optimization
Design and implement large-scale distributed training systems optimized for heterogeneous hardware operating under low-bandwidth, high-latency conditions.
Develop and optimize model-parallel training strategies (data, tensor, pipeline parallelism) with custom sharding techniques that minimize communication overhead.
Optimize GPU utilization, memory efficiency, and compute performance across distributed nodes.
Implement robust checkpointing, state synchronization, and recovery mechanisms for long-running, fault-prone training jobs.
Build monitoring and metrics systems to track training progress, model quality, and system bottlenecks.
Decentralized Networking & Resilience
Architect resilient training systems where nodes can fail, networks can partition, and participants can dynamically join or leave.
Design and optimize peer-to-peer topologies for decentralized coordination across non-co-located nodes.
Implement NAT traversal, peer discovery, dynamic routing, and connection lifecycle management.
Profile and optimize communication patterns to reduce latency and bandwidth overhead in multi-participant environments.
What You'll Bring
Strong experience building and operating distributed systems in production.
Hands-on expertise with distributed training frameworks (FSDP, DeepSpeed, Megatron, or similar).
Deep understanding of model parallelism (data, tensor, pipeline parallelism).
Expert-level Python with production experience (concurrency, error handling, retry logic, clean architecture).
Strong networking fundamentals: P2P systems, gRPC, routing, NAT traversal, distributed coordination.
Experience optimizing GPU workloads, memory management, and large-scale compute efficiency.
What We Offer
Equity-heavy compensation with meaningful ownership in a mission-driven company
Competitive base salary for senior engineering roles in Australia
Visa sponsorship available for exceptional candidates
Remote-first with optional access to our Melbourne hub
World-class team — team mates were previously at Google, Amazon, Microsoft, and leading startups
Backed by Union Square Ventures and other tier-1 investors, we're a world-class, deeply technical team of ML researchers and engineers. Pluralis is unapologetically ideological. We view the world as a better place if we are able to implement what we are attempting, and Protocol Learning as the only plausible approach to preventing a handful of massive corporations monopolizing model development, access and release, and achieving massive economic capture. If this resonates, please apply.
- ...Research carries out foundational research on Protocol Learning: multi-participant training of foundation models... ...economics. We’re looking for Senior/Staff engineers with 5+ years of experience in distributed systems and ML large‑scale training. You’ll be implementing a...SuggestedRemote workVisa sponsorship
- ...Overview We’re looking for a Machine Learning Systems Engineer to strengthen the performance and scalability of our distributed training infrastructure. In this role, you'll work... ...looking for Experience with large-scale ML training pipelines and distributed training...Suggested
- A leading AI research company in San Francisco seeks Senior/Staff Engineers skilled in distributed systems and large-scale ML training. Responsibilities include designing systems optimized for low-bandwidth conditions and implementing robust training strategies. Ideal candidates...SuggestedRemote work
- A mission-driven technology company in California is seeking experienced Senior/Staff Engineers proficient in building distributed ML systems. Applicants should possess strong experience in optimizing large-scale training under low-bandwidth conditions, with expertise in...SuggestedRemote work
- ...As a Research Engineer, Distributed Data Systems, you will design and scale the infrastructure that powers large-scale multimodal training and evaluation at OpenAI. You’ll manage distributed data pipelines, collaborate closely with researchers to translate requirements...Suggested
- ...capabilities with the constraints of physical systems to improve peoples' lives. About the Role As a Research Engineer, Distributed Data Systems, you will design and scale... ...storage, streaming infrastructure, machine learning infrastructure while ensuring...Work at officeRemote workRelocation package
$170k - $200k
...reliable, field-ready AI systems that solve the... ..., combining rigorous engineering with learning systems proven in globally... ...a Senior / Staff ML Systems Engineer to architect... ...and build the distributed infrastructure that powers large-scale machine learning workflows across...Local area- ...Francisco is looking for a Senior Software Engineer to build scalable infrastructure for... ...of foundation models. You will design distributed training systems and optimize GPU utilization while... ...candidates have over 5 years of experience in ML infrastructure and a strong background...
- ...California seeks a Member of Technical Staff — Training to design and optimize large-scale distributed training systems for frontier AI models. Candidates should have 5+ years of experience in ML systems and be proficient in Python along with another systems language, such as...
$136.32k - $287.41k
...The Red Hat AI Inference Engineering team accelerates AI for... ...LLM deployments. As a Machine Learning Engineer focused on distributed vLLM infrastructure in... ...challenges in scalable inference systems and Kubernetes-native... ...engagement in the ML research community (publications...Permanent employmentFull timeContract workWork experience placementWork at officeRemote workFlexible hours$150k
...data scientists, and engineers, tackling the most fundamental... ...computing in deep learning, driving impactful... ...pioneers. The Role The Distributed ML Engineer will play a... ...performance for the machine learning software... ...new and cutting-edge systems. The ideal candidate will...Work experience placementRemote workVisa sponsorship$144k - $192k
...Mission Summary: We are looking for a Machine Learning Systems Engineer to join our ML Acceleration team. In this role, you will be responsible for... ...will directly impact our ability to scale large-scale distributed model training and reduce the time-to-convergence...Work at officeRemote work- ...EvenUp Machine Learning Engineer EvenUp is on a mission to close the justice... ...accessibility to the legal system. Tackling the most complex... .... Working alongside senior ML engineers, data scientists,... ...engineering skills (Python, distributed computing, APIs). Strong...Full timeTemporary workLocal areaRemote workHome officeFlexible hours
$320k - $405k
...Machine Learning Systems Engineer, Research Tools San Francisco, CA | New York City, NY | Seattle, WA... ...learning systems, data pipelines, or ML infrastructure Are proficient in... ...directly enables scientific progress Distributed systems and parallel computing for...Work at officeVisa sponsorshipFlexible hours$156.74k - $261.23k
...over 100 countries. Learn more about what we... ...across complex engineering workflows. Built on... ...enabling agentic systems to explore design... ...intersection of machine learning, data engineering... ...Design and train ML models that... ..., algorithms, and distributed systems) and their...Flexible hoursShift work- ...of their business systems through natural language... ...and continuously learn and adapt.... ...Moveworks' Reasoning Engine and natural language... ...are looking for a Machine Learning Engineer... ...cutting edge ML infrastructure for... ...responsibilities including distributed training and...Work at officeRemote workFlexible hours
$144k - $192k
...Machine Learning Systems Engineer Boston, MA We are looking for a Machine Learning Systems Engineer to join our ML Acceleration team. In this role, you will be responsible for the core... ...our ability to scale large-scale distributed model training and reduce the time...Work at officeRemote work$195.78k - $242.1k
...to explore, create, play, learn, and connect with friends in... ...everyone. Recommendation Systems are a key growth lever at... ...techniques- with large-scale engineering to bridge experimentation... ...Notifications: owns the distributed systems and ML platform that transform billions...Full timeWork experience placementH1bWork at officeLocal areaVisa sponsorshipMonday to Friday$216.7k - $303.4k
...Senior Machine Learning Systems Engineer Remote - United States Reddit is a community of communities. It’... ...Learning teams. What You’ll Do As a Senior ML Infrastructure Engineer, you will... ...and GPU training costs in a large, distributed ML training environment. Optimize...Remote work- ...interpretable, and steerable AI systems. We want AI to be safe... ...researchers, engineers, policy experts, and... ...work at the frontier of machine learning, implementing and... ...and steerable AI. As an ML Systems Engineer on our... ...performance, large scale distributed systems Large scale...Work at officeVisa sponsorshipFlexible hours
- ...of their business systems through natural language... ...and continuously learn and adapt.... ...Moveworks' Reasoning Engine and natural language... ...engineer with machine learning expertise... ..., and keeping our ML at the cutting edge... ...We approach our distributed world of work with...Work at officeImmediate startRemote workFlexible hours
$266k - $372.4k
...responsible for the end-to-end systems that power personalization... ...Do As a Senior Staff Machine Learning Systems Engineer, you will help define and... .... We are looking for a distributed systems expert to own the... ...features without needing deep ML expertise. Build...For contractorsWork experience placementRemote workFlexible hoursShift work- ...tools for making buildings,machines, and even the latest movies... ...Autodesk is looking for an ML Engineer, ML Systems and Infrastructure to help... ...behind large-scale machine learning systems. In this role, you... ...friendly, with team members distributed across the US and Canada....For contractorsRemote work
$164k - $313.3k
...Opportunity Photoshop ART is seeking a Senior Machine Learning (ML) Systems & Efficiency Engineer to join our R&D team focused on delivering practical... ...will be given to candidates with experience in distributed inference, multimodal model profiling, and performance...Temporary workLocal areaWorldwide- ...Distributed Systems Software Engineer, Python / GoJoin to apply for the Distributed Systems Software Engineer... ...capabilities to new clouds and developing AI/ML pipelines for automatic analysis of... ...remotely since 2004!Personal learning and development budget of USD 2,000...Local areaRemote workWorldwide
$55 per hour
...Distributed Systems Software Engineer, Python / Go Join to apply for the Distributed Systems Software Engineer... ...capabilities to new clouds and developing AI/ML pipelines for automatic analysis of... ...remotely since 2004! Personal learning and development budget of USD 2,000...Full timeLocal areaRemote workWorldwide$97.5k - $199.5k
...Python code to support AI and machine learning initiatives. This role... ...experience designing and building distributed systems and large-scale... ...development environment AI/ML experience: Proven track... ...technical guidance to junior engineers and peers Drive adoption...Temporary workFlexible hours$153k - $207k
...Altos Altos Labs is seeking a Machine Learning Engineer I/II who can accelerate and... ...to our Computational Systems Modeling & Scaling team, working... ...in a team of experienced ML and infrastructure engineers... ...pipelines, feature stores, distributed training, architecture...Contract workLocal area- ...world-class scientists, ML researchers, and engineers to work together to move... ...level something that can be learned, predicted, and designed... ...to deployment on distributed infrastructure. We are a... ...at the intersection of machine learning systems architecture and distributed...Work at office
- An innovative firm is seeking a passionate software engineer to join their distributed systems team. This role offers the chance to develop resilient cloud... ...applications and contribute to cutting-edge initiatives in AI/ML and CI pipelines. As part of a globally distributed team...Remote work
Do you want to receive more vacancies?
Subscribe and receive similar vacancies to Machine Learning Engineer - Distributed ML Systems. Be the first to apply!
- machine learning ai engineer United States
- lead machine learning engineer United States
- machine learning engineer United States
- entry level machine learning engineer United States
- staff machine learning engineer United States
- junior machine learning research engineer United States
- junior machine learning engineer United States
- machine learning software engineer United States
- ai ml engineer United States
- senior ml engineer United States

