Sign up to access all features of our service.
  • Job search
  • Favorites
  • Create a CV
    New
  • Salaries
  • Subscriptions

Distributed Systems/ML Engineer

$245k - $385k

Dormont Manufacturing Co

About the Team The Platform ML team builds the ML side of our state-of-the-art internal training framework used to train our cutting-edge models. We work on distributed model execution as well as the interfaces and implementation for model code, training, and inference. Our priorities are to maximize training throughput (how quickly we can train a new model) and researcher throughput (how quickly we can develop new models) with the goal of accelerating progress towards AGI. We frequently collaborate with other teams to speed up the development of new capabilities. About the Role As a Distributed Systems/ML engineer, you will work on improving the training throughput for our internal training framework, while enabling researchers to experiment with new ideas. This requires good engineering (for example designing, implementing, and optimizing state-of-the-art AI models), writing bug‑free machine learning code (surprisingly difficult!), and acquiring deep knowledge of the performance of supercomputers. In all the projects this role pursues, the ultimate goal is to push the field forward. We’re looking for people who love optimizing performance, understanding distributed systems, and who cannot stand having bugs in their code. Since our training framework is used for large runs with massive numbers of GPUs, performance improvements here will have a large impact. This role is based in San Francisco, CA. We use a hybrid work model of 3 days in the office per week and offer relocation assistance to new employees. In this role, you will: Apply the latest techniques in our internal training framework to achieve impressive hardware efficiency for our training runs Profile and optimize our training framework Work with researchers to enable them to develop the next generation of models You might thrive in this role if you: Have run small scale ML experiments Love figuring out how systems work and continuously come up with ideas for how to make them faster while minimizing complexity and maintenance burden Have strong software engineering skills and are proficient in Python About OpenAI OpenAI is an AI research and deployment company dedicated to ensuring that general-purpose artificial intelligence benefits all of humanity. We push the boundaries of the capabilities of AI systems and seek to safely deploy them to the world through our products. AI is an extremely powerful tool that must be created with safety and human needs at its core, and to achieve our mission, we must encompass and value the many different perspectives, voices, and experiences that form the full spectrum of humanity. We are an equal opportunity employer and do not discriminate on the basis of race, religion, national origin, gender, sexual orientation, age, veteran status, disability or any other legally protected status. For US Based Candidates: Pursuant to the San Francisco Fair Chance Ordinance, we will consider qualified applicants with arrest and conviction records. We are committed to providing reasonable accommodations to applicants with disabilities, and requests can be made via this link. OpenAI Global Applicant Privacy Policy At OpenAI, we believe artificial intelligence has the potential to help people solve immense global challenges, and we want the upside of AI to be widely shared. Join us in shaping the future of technology. Compensation

$245K – $385K

The base pay offered may vary depending on multiple individualized factors, including market location, job-related knowledge, skills, and experience. In addition to the salary range listed above, total compensation also includes generous equity and benefits. Medical, dental, and vision insurance for you and your family Mental health and wellness support 401(k) plan with 50% matching Unlimited time off and 13 company holidays per year Paid parental leave (20 weeks) and family‑planning support Annual learning & development stipend ($1,500 per year) #J-18808-Ljbffr Dormont Manufacturing Co

Vacancy posted 1 day ago
Similar jobs that could be interesting for youBased on the Distributed Systems/ML Engineer in San Francisco, CA vacancy
  • A leading AI research company in San Francisco seeks Senior/Staff Engineers skilled in distributed systems and large-scale ML training. Responsibilities include designing systems optimized for low-bandwidth conditions and implementing robust training strategies. Ideal candidates... 
    Suggested
    Remote job

    Pluralis Research

    San Francisco, CA
    1 day ago
  • $245k - $385k

    Dormont Manufacturing Co is seeking a Distributed Systems/ML engineer in San Francisco, CA. You'll improve training throughput for our internal framework and enable researchers to innovate. Strong Python skills are essential. The position offers a hybrid work model, robust... 
    Suggested

    Dormont Manufacturing Co

    San Francisco, CA
    2 days ago
  •  ...Francisco is looking for a Senior Software Engineer to build scalable infrastructure for...  ...of foundation models. You will design distributed training systems and optimize GPU utilization while...  ...candidates have over 5 years of experience in ML infrastructure and a strong background... 
    Suggested

    Baseten

    San Francisco, CA
    2 days ago
  • Dormont Manufacturing Co is looking for a Software Engineer for their Pre-training Systems team in San Francisco. Your primary role will be to design and maintain the distributed infrastructure that trains long-context models at scale, tackling challenges related to memory... 
    Suggested

    Dormont Manufacturing Co

    San Francisco, CA
    2 days ago
  • $200.8k - $251k

     ...member to build and optimize a machine learning framework for large language models. Candidates should have system optimization experience and solid software engineering skills, particularly in tools like CUDA and Pytorch. This full-time position offers a competitive salary... 
    Suggested
    Full time

    Scale AI

    San Francisco, CA
    1 day ago
  •  ...streaming service is seeking a Staff Software Engineer to enhance ML infrastructure. The role involves designing scalable systems, mentoring engineers, and collaborating...  ...have over 8 years of experience in building distributed systems, strong skills in AWS, and knowledge... 

    Tubi Tv

    San Francisco, CA
    13 hours ago
  • An innovative company is seeking a Distributed Systems/ML Engineer to enhance the training throughput of its internal framework. This role involves collaborating with researchers to develop efficient video models and applying cutting-edge techniques to optimize training... 

    OpenAI

    San Francisco, CA
    2 days ago
  •  ...and community-owned frontier models with self-sustaining economics. We’re looking for Senior/Staff engineers with 5+ years of experience in distributed systems and ML large‑scale training. You’ll be implementing a novel substrate for training distributed ML models that... 
    Remote work
    Visa sponsorship

    Pluralis Research

    San Francisco, CA
    1 day ago
  •  ...they are reliable, user-friendly, and aligned with our mission of broad societal benefit. About the Role As a Research Engineer, Distributed Data Systems, you will design and scale the infrastructure that powers large-scale multimodal training and evaluation at OpenAI.... 
    Work at office
    Relocation package

    OpenAI

    San Francisco, CA
    2 days ago
  •  ...We're seeking talented MLOps Engineers with deep, hands-on expertise...  ...training data for frontier AI systems. This is a W-2 employment...  ...training infrastructure, and ML framework-level topics. Design...  ...assess training pipeline design, distributed systems reasoning, and kernel... 
    Full time
    Weekday work

    Obsidian

    San Francisco, CA
    4 days ago
  •  ...Dream Team : We've assembled authentication, integrations, distributed systems, and AI experts from Okta, Redis, Microsoft, Splunk, Ngrok,...  ...~ An insatiable desire to ship. ~7+ years of software engineering experience comprising of: ~5+ years of backend development... 
    Work at office
    Shift work

    Arcade AI, Inc

    San Francisco, CA
    3 days ago
  •  ...time putting knowledge into action. We're looking for engineers who want to build the operating system for AI Data Applications and Workflows. About the role We're looking for experienced distributed systems engineers to build the core infrastructure for our... 

    Tensorlake, Inc.

    San Francisco, CA
    1 day ago
  •  ...infrastructure. Help design, implement, and monitor testnets Required Skills: Expert knowledge of peer-to-peer distributed system design and implementation (required) Ability to build and maintain high available infrastructure (required) Knowledge on... 

    1872 Consulting

    San Francisco, CA
    4 days ago
  • $255k - $405k

    Slope is seeking a Software Engineer for its team in San Francisco, CA. The role focuses...  ...training. Responsibilities include managing distributed data pipelines and collaborating closely...  ...strong experience in distributed systems and possess excellent organizational skills... 

    Slope

    San Francisco, CA
    2 days ago
  • $250.8k - $286.2k

    Capital One is seeking a Senior Lead Software Engineer specializing in distributed systems. This role involves leading technology projects and developing solutions to enhance financial empowerment for millions of Americans. The candidate should have extensive experience... 

    Information Technology Senior Management Forum

    San Francisco, CA
    4 days ago
  •  ...Francisco is seeking an experienced Software Engineer to develop machine learning infrastructure for monetization and ads systems. The role involves building data...  ...software engineering, particularly in distributed systems and ML workflows. Join us in shaping the future... 

    AI Chopping Block, Inc.

    San Francisco, CA
    4 days ago
  • $200k - $240k

    Dormont Manufacturing Co is seeking a Senior or Staff ML Systems Engineer to build and maintain AI infrastructure, focusing on model training and deployment. The role involves working in a fast-paced environment to ensure compliance, security, and performance in AI applications... 

    Dormont Manufacturing Co

    San Francisco, CA
    13 hours ago
  •  ...in enhancing AI model performance. This remote contract role requires 40 hours a week and involves working with MLOps and ML systems engineering tasks. The ideal candidate should have 2+ years of experience in ML infrastructure and hands-on production experience with... 
    Remote job
    Contract work

    Mercor

    San Francisco, CA
    3 days ago
  • A tech-driven company focused on blockchain solutions is seeking a Senior ML Systems Engineer. In this role, you will build reusable workflows, automate model versioning, and deploy scalable AI systems. Candidates should have strong programming skills, experience with scalable... 

    TRM Labs

    San Francisco, CA
    2 days ago
  •  ..., Inc. is looking for a Member of Technical Staff focused on ML systems and inference in San Francisco. You will design and build inference...  .... Candidates should have strong foundations in software engineering, experience with ML inference systems, and performance tuning... 

    Gimlet Labs, Inc.

    San Francisco, CA
    2 days ago
  • MakerMaker.AI is seeking a Senior ML Engineer in San Francisco. In this role, you will build and maintain machine learning systems and pipelines for research purposes, ensuring accurate and reliable results. You will lead the development from prototype to production, collaborating... 

    MakerMaker.AI

    San Francisco, CA
    4 days ago
  • Clera is seeking a Senior AI/ML Engineer to build production-grade ML infrastructure. You will design and ship end-to-end ML systems including data pipelines, training, and deployment. The role requires 4+ years of applied ML engineering experience in production settings... 
    Full time

    Clera

    San Francisco, CA
    13 hours ago
  •  ...Member of Technical Staff to design and optimize inference systems. The role involves managing KV cache allocation and...  ...components. Ideal candidates should have strong software engineering skills and experience with ML inference systems, particularly in Python and C++. This... 

    Gimlet Labs

    San Francisco, CA
    13 hours ago
  • $300k - $405k

    A leading AI research company in New York seeks a Machine Learning Systems Engineer to build cutting-edge systems for training AI models. This role involves developing critical algorithms, improving system performance, and collaborating with a dynamic research team. Ideal... 
    Work at office

    Menlo Ventures

    San Francisco, CA
    2 days ago
  • $250k - $400k

     ...Francisco seeks experienced professionals to build and scale systems for AI-driven scientific discovery. The role involves developing...  ...from $250K to $400K base plus equity, with opportunities for ML Engineers, ML Infra, Research Engineers, and Research Scientists. #J-188... 
    Remote job

    Trades Workforce Solutions

    San Francisco, CA
    2 days ago
  • AI Chopping Block, Inc. is seeking a Machine Learning Engineer to design and build scalable machine learning systems. Responsibilities involve developing end-to-end ML pipelines, optimizing AI models for mobile environments, and integrating AI-driven solutions into applications... 

    AI Chopping Block, Inc.

    San Francisco, CA
    13 hours ago
  • $150k - $190k

     ...is seeking a qualified individual with a PhD in databases or systems, and a passion for database and performance optimization. This...  ...and implementing innovative solutions in query optimization, distributed query execution, and resource management. The compensation for... 

    Menlo Ventures

    San Francisco, CA
    1 day ago
  • Dedalus Labs, Inc. is seeking a Systems Engineer to build distributed infrastructure for AI applications. This role requires strong skills in Rust, Go, or C/C++ and a deep understanding of operating systems and distributed systems. You will work on innovative projects that... 
    Work at office
    Visa sponsorship
    Relocation package

    Dedalus Labs, Inc.

    San Francisco, CA
    2 days ago
  • $175k - $240k

    Langchain is hiring a Systems/Database Engineer based in San Francisco. The ideal candidate will have over 5 years of experience in systems/database...  ...query execution, optimizing performance, and deploying distributed database services. Compensation ranges from $175,000 to $2... 
    Flexible hours

    Langchain

    San Francisco, CA
    1 day ago
  • $180k - $300k

     ...redefine how value is discovered, priced, and distributed in real time. The mission centers on...  ..., transparency, and efficiency to systems where value is currently fragmented and...  ...components and services while establishing engineering best practices and code quality... 
    Full time
    Remote work
    Flexible hours

    MLabs Ltd

    San Francisco, CA
    13 hours ago

Do you want to receive more vacancies?

Subscribe and receive similar vacancies to Distributed Systems/ML Engineer. Be the first to apply!