Sign up to access all features of our service.
  • Job search
  • Favorites
  • Create a CV
    New
  • Salaries
  • Subscriptions

ML Infra Engineer — Scalable Training Systems

Monograph

A leading tech company in San Francisco seeks a Machine Learning Engineer to build and maintain infrastructure for large-scale model training. In this hands-on role, you will design systems, work closely with researchers, and optimize training processes. Candidates should have strong software engineering skills and experience with JAX or PyTorch. Join a dynamic team at the forefront of machine learning and contribute to core training code and systems. #J-18808-Ljbffr Monograph

Vacancy posted 1 day ago
Similar jobs that could be interesting for youBased on the ML Infra Engineer — Scalable Training Systems in San Francisco, CA vacancy
  •  ...is looking for a Senior Software Engineer to build scalable infrastructure for large‑scale training and fine-tuning of foundation...  ...will design distributed training systems and optimize GPU utilization while...  ...over 5 years of experience in ML infrastructure and a strong background... 
    Training

    Baseten

    San Francisco, CA
    6 days ago
  •  ...pioneering AI firm based in San Francisco is seeking a Research Engineer, Distributed Data Systems. In this role, you will design and maintain infrastructure for large-scale multimodal training, ensuring scalability and reliability of data systems. Candidates should have... 
    Training
    Work at office
    Relocation package

    OpenAI

    San Francisco, CA
    2 days ago
  •  ...Technologies is seeking a Machine Learning Platform Engineer to help build scalable systems that support model training for their Machine Learning Platform team in San...  ...enhance the productivity of data scientists and ML engineers. The ideal candidate should have a strong... 
    Training

    CVFine by Instrovate Technologies

    San Francisco, CA
    2 days ago
  • $295k - $380k

     ...OpenAI is searching for a Senior Software Engineer to join their Robotics team in San Francisco. The...  ...focuses on maintaining and improving the training framework while actively reviewing and debugging code within ML systems. The ideal candidate should thrive in hands-on... 
    Training

    OpenAI

    San Francisco, CA
    5 days ago
  •  ...seeking an experienced Software Engineer to develop machine learning...  ...infrastructure for monetization and ads systems. The role involves building data pipelines, creating training platforms, and collaborating...  ...in distributed systems and ML workflows. Join us in shaping the... 
    Training

    AI Chopping Block, Inc.

    San Francisco, CA
    3 days ago
  •  ...Member of Technical Staff focused on building and optimizing ML inference systems in San Francisco. The role involves designing end-to-end...  ...real-world workloads. Candidates should have strong software engineering skills, experience with ML inference systems, and... 

    Acceler8 Talent

    San Francisco, CA
    5 days ago
  •  ...in San Francisco, is hiring a Machine Learning Infra Engineer. This role involves building and maintaining the training and inference frameworks necessary for optimal...  ...possess strong Python skills, have a background in systems engineering, and experience with Kubernetes.... 
    Training

    Reducto

    San Francisco, CA
    5 days ago
  • ML Systems Engineer - Robotics & AI We are building the full-stack foundation for the next generation...  ...and handling scenarios unseen in training. We work at the intersection of large-...  ...engineers to translate model changes into scalable implementations. Provide guidance on... 
    Training

    Maxwell Bond

    San Francisco, CA
    3 days ago
  •  ...for the physical world. Training our models requires...  ...to-end: the scheduling systems, the placement logic, the...  ...seamless. The Team The ML Infrastructure team supports...  ...work closely with ML Infra (training systems),...  ...Bring Strong software engineering fundamentals Experience... 
    Training

    Physical Intelligence

    San Francisco, CA
    4 days ago
  •  ...A leading streaming service is seeking a Staff Software Engineer to enhance ML infrastructure. The role involves designing scalable systems, mentoring engineers, and collaborating with cross-functional teams. Candidates should have over 8 years of experience in building... 

    Tubi TV

    San Francisco, CA
    5 days ago
  • A tech-driven company focused on blockchain solutions is seeking a Senior ML Systems Engineer. In this role, you will build reusable workflows, automate model versioning, and deploy scalable AI systems. Candidates should have strong programming skills, experience with... 

    TRM Labs

    San Francisco, CA
    1 day ago
  • TRM Labs is looking for a Senior or Staff ML Systems Engineer to focus on building and scaling the technical infrastructure for AI/ML systems...  ...strong Python programming skills, a solid background in scalable infrastructure, and experience deploying LLM workflows. Join... 

    TRM Labs

    San Francisco, CA
    1 day ago
  • AI Chopping Block, Inc. is seeking a Machine Learning Engineer to design and build scalable machine learning systems. Responsibilities involve developing end-to-end ML pipelines, optimizing AI models for mobile environments, and integrating AI-driven solutions into applications... 

    AI Chopping Block, Inc.

    San Francisco, CA
    4 days ago
  • $248.8k - $311k

     ...research in Physical AI and developing ML pipelines for processing, training, and fine-tuning on data collected...  ...AI. The Role As an ML Systems Engineer on the Physical AI team, you will design and build platforms for scalable, reliable, and efficient serving of... 
    Training
    Full time

    Scale AI

    San Francisco, CA
    17 days ago
  •  ...Technical Staff to focus on cutting-edge AI research and development. The role involves building and scaling training and inference infrastructure, designing ML kernels, and optimizing performance. Ideal candidates should have a passion for addressing ambitious challenges... 
    Training

    Mirendil

    San Francisco, CA
    1 day ago
  • $200k - $240k

     ...world for all. The AI Engineering Team is chartered with...  ...Models (LLMs) and agentic systems. Our mission is to...  ...As a Senior or Staff ML Systems Engineer - LLM...  ...CD workflows for model training, evaluation, and deployment...  ...out a modular and scalable AI infrastructure... 
    Training
    Remote work
    Worldwide

    TRM Labs

    San Francisco, CA
    3 days ago
  • A leading AI technology company in San Francisco is seeking an engineering professional to develop and manage intelligent job scheduling systems for large-scale AI applications. This role focuses on ensuring efficient resource allocation across GPU and TPU clusters while... 
    Training

    Physical Intelligence

    San Francisco, CA
    1 day ago
  •  ...Francisco is seeking a specialist to design and operate large-scale GPU infrastructure. This role requires expertise in deploying GPU systems for high-throughput inference and model performance optimization. The ideal candidate will have hands-on experience with modern... 
    Training

    Reflection AI

    San Francisco, CA
    1 day ago
  • A leading AI company in San Francisco is seeking a skilled ML Infrastructure Engineer to manage and optimize large-scale training systems. In this role, you will design and maintain infrastructure for model training, ensuring efficient GPU/TPU utilization while working... 
    Training

    Physical Intelligence

    San Francisco, CA
    3 days ago
  •  ...like PDFs and spreadsheets. We train vision models to read those...  ...are hiring a Machine Learning Engineer to help us train and deploy...  ...product. The Opportunity As an ML Infra Engineer , you’ll play a key...  ...work to apply them. Design systems for scaling model training across... 
    Training
    Work at office
    Local area

    Reducto

    San Francisco, CA
    5 days ago
  •  ...don't believe culture can be engineered - but when it falls into place...  ...We're looking for an ML infrastructure engineer to help...  ..., and scale the foundational systems we need to realize our ambitious...  ...supports every stage of the ML training flywheel and be an important... 
    Training
    Local area

    Humble Robotics

    San Francisco, CA
    1 day ago
  • About the Role ML Ops Engineer — Agentic AI Lab (Founding Team...  ...and production systems — responsible for automating the model training, deployment, versioning...  ...and maintain secure, scalable, and automated pipelines...  ...platform engineering, or infra-focused ML roles Deep... 
    Training
    Full time

    Fabrion

    San Francisco, CA
    3 days ago
  • A leading streaming platform in San Francisco is seeking a Software Engineer to design and build scalable distributed systems. The ideal candidate will have over 8 years of experience with Scala and strong cloud platform expertise, particularly AWS. Responsibilities include... 
    Flexible hours

    Tubi Tv

    San Francisco, CA
    5 days ago
  • A leading AI research company in San Francisco seeks Senior/Staff Engineers skilled in distributed systems and large-scale ML training. Responsibilities include designing systems optimized for low-bandwidth conditions and implementing robust training strategies. Ideal... 
    Training
    Remote work

    Pluralis Research

    San Francisco, CA
    1 day ago
  • $300k - $405k

     ...A leading AI research company in New York seeks a Machine Learning Systems Engineer to build cutting-edge systems for training AI models. This role involves developing critical algorithms, improving system performance, and collaborating with a dynamic research team. Ideal... 
    Training
    Work at office

    Menlo Ventures

    San Francisco, CA
    5 days ago
  • MakerMaker.AI is seeking a Senior ML Engineer in San Francisco. In this role, you will build and maintain machine learning systems and pipelines for research purposes, ensuring accurate...  ...and owning the data pipelines for training and evaluation. If you have 6+ years of experience... 
    Training

    MakerMaker.AI

    San Francisco, CA
    3 days ago
  • $180k - $250k

     ...You will design and implement innovative model serving architectures while working with the Applied ML team and customers. The ideal candidate has expertise in systems programming and deep understanding of cutting-edge ML infrastructure. Compensation ranges from $180,0... 

    fal

    San Francisco, CA
    4 days ago
  • $295k

     ...the constraints of physical systems to improve peoples' lives....  ...the Role As a Research Engineer, Distributed Data Systems, you...  ...powers large-scale multimodal training and evaluation at OpenAI. You...  ...infrastructure while ensuring scalability, reliability, and security.... 
    Training
    Work at office
    Relocation package

    OpenAI

    San Francisco, CA
    3 days ago
  • An AI and Robotics firm in San Francisco seeks a Staff/Principal ML Systems Engineer to enhance training performance for multimodal robotic data. You will lead efforts to improve end-to-end training efficiency and collaborate with a team dedicated to cutting-edge robotics... 
    Training

    Maxwell Bond

    San Francisco, CA
    3 days ago
  •  ...in San Francisco is looking for skilled engineers to work on autonomous R&D systems in machine learning. You will design experiments...  ...that perform reliably in large-scale ML settings. The ideal candidate will have experience in training ML models, backend systems, and a... 
    Training
    Full time

    Thesis (YC F25)

    San Francisco, CA
    3 days ago

Do you want to receive more vacancies?

Subscribe and receive similar vacancies to ML Infra Engineer — Scalable Training Systems. Be the first to apply!