Sign up to access all features of our service.
  • Job search
  • Favorites
  • Create a CV
    New
  • Salaries
  • Subscriptions

Staff AI/ML Engineer: Large-Scale Training Systems

PrismML

PrismML is seeking a Staff-level AI/ML engineer to lead large-scale model training efforts. This role focuses on technical direction, mentoring engineers, and enhancing model quality and system performance. The ideal candidate will design, implement, and optimize distributed training systems for large models while providing guidance and leadership. Successful applicants will have extensive experience in AI/ML systems, strong Python skills, and familiarity with modern training frameworks. Exciting challenges await in building efficient, production-ready systems. #J-18808-Ljbffr PrismML

Vacancy posted 2 days ago
Similar jobs that could be interesting for youBased on the Staff AI/ML Engineer: Large-Scale Training Systems in San Francisco, CA vacancy
  •  ...environments—from edge devices to large-scale deployments. Our work...  ...strong focus on scalable training, efficient inference, and...  ...We are seeking a Staff-level (or higher) AI/ML engineer to lead large-scale model...  ...improving model quality and system performance across the organization... 
    Training

    PrismML

    San Francisco, CA
    1 day ago
  •  ...Staff ML Platform Engineer – Large Scale Training (LLMOps/MLOps) We're TrueFoundry, and we're building the foundational infrastructure for production AI systems. We're looking for a Staff ML Platform Engineer – Large Scale Training (LLMOps/MLOps) to join the team.... 
    Training
    Flexible hours

    TrueFoundry

    San Francisco, CA
    3 days ago
  • A leading AI research organization in San Francisco seeks an Infrastructure Engineer to design and maintain large distributed ML training and inference clusters. The ideal candidate will have a strong grasp of optimizing training workloads and experience with distributed... 
    Training

    Causal Labs

    San Francisco, CA
    1 day ago
  •  ...AI/ML Engineer (RL & Physical Systems) FLUIX is building the AI Operating System for data centers. We deploy...  ...cooling loops, and real megawatt-scale infrastructure. Who You'll Work Closely...  ...environments to accelerate training, testing, and Sim2Real deployment.... 
    Training
    Weekend work

    Fluix AI

    San Francisco, CA
    4 days ago
  • $147.4k - $272.1k

    Machine Learning Engineer — Large Language Models, Generative AI & Agentic Systems San Francisco Bay Area, California...  ...-quality inferences at scale! Description We are in...  ...is curiosity, strong ML fundamentals, and the...  ...Experience with model training, fine-tuning, or building... 
    Training
    Relocation

    Apple Inc.

    San Francisco, CA
    2 days ago
  • $230k - $310k

     ...company in San Francisco is seeking a Staff Engineer to lead critical backend initiatives. This...  ...architecting scalable back-end systems and mentoring engineers while ensuring...  ...expertise in event streaming systems and large-scale APIs. The position offers a competitive... 
    Work at office
    Remote work

    Gamma

    San Francisco, CA
    1 day ago
  • $170k - $216k

     ...Job Description: ai/ml phthon engineer The Perception team builds the system which learns the spatial-temporal representation...  ...and continuously learning from large scale real-world data, to (2) develop models and model training at scale, to (3) analyze real-... 
    Training
    Full time
    Remote work

    ESR Healthcare

    San Francisco, CA
    2 days ago
  •  ...Senior AI / ML Engineer We are seeking a proactive, hands...  ...frontier of intelligent systems within the sector of...  ...stakeholder feedback to scale models for production...  ...track record of building, training, and deploying ML and...  ...those utilizing Large Language Models (LLMs)... 
    Training

    Implaion Recruiting

    San Francisco, CA
    8 hours ago
  • $300k - $400k

     ...Global (NYSE: ZETA) is the AI-Powered Marketing...  ...Description As a Principal AI/ML Engineer in our AdTech team,...  ..., operating at large scale and low latency to handle...  ...to ensure our ML systems are highly performant,...  ...from data ingestion and training to real-time inference... 
    Training

    Zeta Global

    San Francisco, CA
    4 days ago
  • $308k - $423.5k

     ...seeking a  Principal AI / ML Engineer to be a  company-level...  ...lead deployment of AI systems (LLM fine-tuning, RLHF...  ...+ years of experience training and deploying machine learning models at scale, conducting applied AI...  ...background: experience with large-scale data pipelines,... 
    Training
    Work experience placement
    Work at office
    Local area
    Remote work
    Monday to Friday
    Flexible hours
    3 days per week

    Faire Inc

    San Francisco, CA
    4 days ago
  • $140k - $185k

     ...develop edge machine learning systems that to improve the...  ...construction robots Build scalable ML infrastructure for model training, validation, deployment,...  ...environments Analyze large-scale operational datasets to...  ...commercial and open-source AI tools into our autonomy stack... 
    Training
    Local area
    Flexible hours

    Built Robotics Inc

    San Francisco, CA
    2 days ago
  • A leading AI company in San Francisco is seeking a skilled ML Infrastructure Engineer to manage and optimize large-scale training systems. In this role, you will design and maintain infrastructure for model training, ensuring efficient GPU/TPU utilization while working... 
    Training

    Physical Intelligence

    San Francisco, CA
    4 days ago
  •  ...environments—from edge devices to large-scale deployments. Our work spans...  ...a strong focus on scalable training, efficient inference, and...  ...Overview We are seeking a Staff-level (or higher) AI/ML engineer with expertise in multimodal systems to lead the development of capabilities... 
    Training

    PrismML

    San Francisco, CA
    1 day ago
  • A leading AI research company in San Francisco seeks Senior/Staff Engineers skilled in distributed systems and large-scale ML training. Responsibilities include designing systems optimized for low-bandwidth conditions and implementing robust training strategies. Ideal... 
    Training
    Remote work

    Pluralis Research

    San Francisco, CA
    2 days ago
  • $181.1k - $318.4k

    Apple Inc. is looking for a Staff ML Infrastructure Engineer in San Francisco to lead pre-training initiatives for cutting-edge foundation models in machine learning...  ...years of experience in building scalable backend systems, be proficient in Python and Go, and possess... 
    Training

    Apple Inc.

    San Francisco, CA
    4 days ago
  • $197.3k - $225.1k

     ...Lead AI/ML Engineer (Platform, kubeflow) Overview At Capital One...  ...responsible and reliable AI systems, changing banking for good....  ...including foundation model training, large language model inference, similarity...  ..., throughput — of large scale production AI systems.... 
    Training
    Full time
    Part time
    Local area

    Capital One Financial Corp

    San Francisco, CA
    1 day ago
  • A leading tech company in San Francisco seeks a Machine Learning Engineer to build and maintain infrastructure for large-scale model training. In this hands-on role, you will design systems, work closely with researchers, and optimize training processes. Candidates should... 
    Training

    Monograph

    San Francisco, CA
    2 days ago
  • A leading AI technology company in San Francisco...  ...for a Senior Software Engineer to build scalable infrastructure for large‑scale training and fine-tuning of foundation...  ...distributed training systems and optimize GPU...  ...years of experience in ML infrastructure and a strong... 
    Training

    Baseten

    San Francisco, CA
    2 days ago
  • $216k - $270k

     ...Scale's Physical AI business unit is dedicated to solving the data...  ...Physical AI and developing ML pipelines for processing, training, and fine-tuning on...  ...The Role As an ML Systems Engineer on the Physical AI team...  ...of experience building large-scale, high-performance... 
    Training
    Full time

    Scale AI

    San Francisco, CA
    8 hours ago
  • B Capital in San Francisco is looking for an engineering professional to architect and optimize core training infrastructure for their AI models. You will work on distributed systems and large-scale data pipelines, focusing on performance and numerical stability. Successful... 
    Training

    B Capital

    San Francisco, CA
    3 days ago
  • $189.6k - $237k

     ...Scale's ML platform (RLXF) team builds our internal...  ...distributed framework for large language model training and inference. The...  ...of the field of AI as an indispensable...  ...to optimize our ML system Ideally you'd have:...  ...systems Strong software engineering skills, proficient... 
    Training
    Full time

    DiversityJobs Inc

    San Francisco, CA
    12 days ago
  • Genesis AI in San Francisco is looking for an experienced professional to optimize and build distributed training systems using PyTorch. The ideal candidate has over 8 years of experience...  ...and developing tools for monitoring large-scale runs. This role requires a system-... 
    Training

    Genesis AI

    San Francisco, CA
    2 days ago
  • A pioneering AI firm based in San Francisco is seeking a Research Engineer, Distributed Data Systems. In this role, you will design and maintain infrastructure for large-scale multimodal training, ensuring scalability and reliability of data systems. Candidates should... 
    Training
    Work at office
    Relocation package

    OpenAI

    San Francisco, CA
    3 days ago
  • $218.4k - $273k

    Scale's Physical AI business unit is dedicated to solving the data...  ...Physical AI and developing ML pipelines for processing, training, and fine-tuning on...  ...AI. The Role As an ML Systems Engineer on the Physical AI team...  ...of experience building large-scale, high-performance... 
    Training
    Full time

    Scale AI, Inc.

    San Francisco, CA
    3 days ago
  •  ...Navi AI Pilot Debrief Intelligence Engineer Navi captures everything a pilot...  ...This is a founding AI/ML role. You'll own the...  ...and improve the ML systems that power Navi's...  ...data ingestion, model training, evaluation, inference...  ...A role that scales into technical leadership... 
    Training

    Navi AI

    San Francisco, CA
    1 day ago
  • $200k - $250k

     ...is a job that Jill, our AI Recruiter, is...  ...Job Title: Founding AI/ML Engineer Salary: $200-250K +...  ...algorithms into scalable ML systems. This high-impact role...  ...-capitalized for rapid scale. What you will do:...  ...data processing, model training, and production-grade deployment... 
    Training

    Jack and Jill AI

    San Francisco, CA
    1 day ago
  •  ...AI/ML Engineer (Computer Vision) A fast-growing applied AI company is...  ...analyze visual information at scale. In this role, you will help...  ...Design and deploy visual AI systems that turn complex technical...  ...improve model pipelines across training, evaluation, inference, and... 
    Training

    Blue Signal Search

    San Francisco, CA
    8 hours ago
  •  ...building the world’s first AI teacher: one that...  ...kids love it. Now, we’re scaling that success into...  ...We’re looking for an AI/ML Engineer to join our product engineering...  ...’s education. Own AI systems and features end-to-end...  ...don’t need deep model training expertise, but you do... 
    Training
    Work at office
    Worldwide
    Shift work

    Ello Technology, Inc

    San Francisco, CA
    1 day ago
  • $240k - $270k

     ...we’re not just adding AI—we’re building the future...  ...you come in. As an AI/ML Engineer, you’ll join a growing...  ...Prototype and productionize AI systems that feel intuitive but...  ...more * Develop and scale AI/ML infrastructure...  ...: data curation, training, deployment, monitoring... 
    Training
    Full time
    Work at office
    Flexible hours

    Sigma Computing

    San Francisco, CA
    8 hours ago
  • $270k - $340k

     ...Research Scientist - Scaling P-1227 About Databricks AI At Databricks,...  ...from post‑training open source LLMs...  ...the boundaries of large language model (...  ...across algorithms, systems, and...  ...implementation details with engineering partners. Role...  ...end‑to‑end ML systems for distributed... 
    Training
    Local area
    Worldwide

    I did my part and supported the Regular Toilet

    San Francisco, CA
    5 days ago

Do you want to receive more vacancies?

Subscribe and receive similar vacancies to Staff AI/ML Engineer: Large-Scale Training Systems. Be the first to apply!