Sign up to access all features of our service.
  • Job search
  • Favorites
  • Create a CV
    New
  • Salaries
  • Subscriptions

Staff AI/ML Engineer - Large-Scale Systems

PrismML

We build high-performance foundation models designed to run efficiently across a wide range of environments—from edge devices to large-scale deployments. Our work spans models from ~1B to 100B+ parameters across LLMs, diffusion models, and other modalities, with a strong focus on scalable training, efficient inference, and real-world deployment. Role Overview We are seeking a Staff-level (or higher) AI/ML engineer to lead large-scale model training efforts. This role combines hands-on ownership of large training runs with responsibility for setting technical direction, mentoring engineers, and improving model quality and system performance across the organization. Responsibilities You will design, implement, and optimize distributed training systems for large-scale models across all major training phases. Core responsibilities include: Leading model development across pretraining, fine-tuning, and post-training stages Designing and improving data pipelines, including curation, filtering, deduplication, and dataset composition Improving training efficiency, scalability, and reliability across large distributed systems Optimizing model performance with respect to convergence, throughput, memory usage, and stability Translating cutting-edge research into robust, production-ready systems Providing technical leadership through mentoring, design reviews, and cross-functional collaboration Basic Qualifications You bring deep experience in large-scale AI/ML systems and strong fundamentals in modern model training: 8–10+ years of experience in machine learning or AI or strong publication record Strong Python programming skills with production-quality code Hands-on experience training large-scale models (multi-billion parameters) Solid understanding of optimization, distributed training, and training dynamics Experience with modern model training workflows (e.g., pretraining, fine-tuning, reinforcement learning approaches) Proven ability to mentor and lead other AI/ML engineers Preferred Qualifications You have additional experience aligned with large-scale, high-performance AI/ML systems: Experience training very large models (tens to hundreds of billions of parameters) Familiarity with modern accelerator hardware (e.g., GPUs or TPUs) and distributed training frameworks Experience improving system performance, resource utilization, and training efficiency Exposure to deployment environments with real-world constraints (e.g., latency, cost, or hardware limitations) Experience with advanced optimization techniques and scaling strategies Contributions to research, publications, or open-source AI/ML systems Ideal Candidate Profile You have led or significantly contributed to training large models end-to-end, understand common failure modes in large-scale training systems, and know how to debug and improve them. You care about building efficient, reliable systems that work in real-world settings, enjoy mentoring others, and thrive at the intersection of research, engineering, and product. #J-18808-Ljbffr PrismML

Vacancy posted 4 days ago
Similar jobs that could be interesting for youBased on the Staff AI/ML Engineer - Large-Scale Systems in San Francisco, CA vacancy
  • PrismML is seeking a Staff-level AI/ML engineer to lead large-scale model training efforts. This role focuses on technical direction, mentoring engineers, and enhancing model quality and system performance. The ideal candidate will design, implement, and optimize distributed... 
    Suggested

    PrismML

    San Francisco, CA
    12 hours ago
  • $230k - $310k

     ...company in San Francisco is seeking a Staff Engineer to lead critical backend initiatives. This...  ...architecting scalable back-end systems and mentoring engineers while ensuring...  ...expertise in event streaming systems and large-scale APIs. The position offers a competitive... 
    Suggested
    Work at office
    Remote work

    Gamma

    San Francisco, CA
    4 days ago
  •  ...Staff ML Platform Engineer – Large Scale Training (LLMOps/MLOps) We're TrueFoundry, and we're building the foundational infrastructure for production AI systems. We're looking for a Staff ML Platform Engineer – Large Scale Training (LLMOps/MLOps) to join the team.... 
    Suggested
    Flexible hours

    TrueFoundry

    San Francisco, CA
    1 day ago
  • A leading AI research organization in San Francisco seeks an Infrastructure Engineer to design and maintain large distributed ML training and inference clusters. The ideal candidate will have a strong grasp of optimizing training workloads and experience with distributed... 
    Suggested

    Causal Labs

    San Francisco, CA
    4 days ago
  •  ...AI/ML Engineer (RL & Physical Systems) FLUIX is building the AI Operating System for data centers. We deploy autonomous AI that optimizes, predicts...  ...touch real chillers, real cooling loops, and real megawatt-scale infrastructure. Who You'll Work Closely With Abhi... 
    Suggested
    Weekend work

    Fluix AI

    San Francisco, CA
    2 days ago
  • $147.4k - $272.1k

    Machine Learning Engineer — Large Language Models, Generative AI & Agentic Systems San Francisco Bay Area, California, United States...  ...high-quality inferences at scale! Description We are in search of...  ...matters most is curiosity, strong ML fundamentals, and the ability to... 
    Relocation

    Apple Inc.

    San Francisco, CA
    12 hours ago
  • A leading AI contracting platform is seeking an AI-native builder for its GTM AI & Systems team in San Francisco. This role focuses on replacing manual marketing tasks with...  ..., and collaborating with various teams to scale solutions. Candidates should have experience... 

    Ironclad Inc.

    San Francisco, CA
    12 hours ago
  • A leading AI Time platform provider in Los Angeles seeks a Staff AI Platform Engineer to enhance its AI platform. The role requires deep experience in building AI/ML platforms at scale and strong backend systems knowledge. Responsibilities include owning AI challenges,... 
    Work at office

    Laurel

    San Francisco, CA
    1 day ago
  • $180k - $260k

     ...AI And ML Engineer Profound is on a mission to help companies understand and control their AI presence. As an AI and ML Engineer, you will design, build, and ship large scale NLP and LLM systems that power classification, ranking, clustering, topic discovery, and content... 
    Work at office

    Profound

    San Francisco, CA
    2 days ago
  •  ...Brain Co. is an applied AI startup co-founded by...  ...governments, healthcare systems, and critical industries...  ...The Role As an AI/ML Engineer at Brain Co., you will...  ...verticals. Optimize and Scale: Build scalable data...  ..., and society at large. Engage with Leaders... 
    Worldwide

    Brainco

    San Francisco, CA
    2 days ago
  •  ...Series A-funded agentic AI company building the...  ...our Silicon Valley engineering team — a small,...  ...FAISS, Weaviate) at scale. • Implement agentic systems using LangGraph, LlamaIndex...  ...AI features into large-scale data pipelines...  ...• 3–5 years of AI/ML engineering; minimum... 
    Visa sponsorship

    HOP

    San Francisco, CA
    2 days ago
  •  ...Senior AI / ML Engineer We are seeking a proactive, hands-on Senior ML/...  ...the frontier of intelligent systems within the sector of advanced...  ...based on stakeholder feedback to scale models for production...  ...specifically those utilizing Large Language Models (LLMs) and transformer... 

    Implaion Recruiting

    San Francisco, CA
    3 days ago
  • $170k - $216k

     ...Job Description: ai/ml phthon engineer The Perception team builds the system which learns the spatial-temporal representation and their semantic...  ...for efficiently and continuously learning from large scale real-world data, to (2) develop models and model... 
    Full time
    Remote work

    ESR Healthcare

    San Francisco, CA
    12 hours ago
  •  ...About the job Applied AI / ML Engineer About Us Catalyst Labs is...  ...Established tech companies: scaling their ML infrastructure, recommendation systems, and data platforms, and Enterprise...  ...of resources and reach of a large multi national firm. Roles &... 
    Full time
    Visa sponsorship

    Catalyst Labs, LLC

    San Francisco, CA
    4 days ago
  • $300k - $400k

     ...Principal AI/ML Engineer - AdTech San Francisco, California, United States Zeta Global...  ...creative content generation, operating at large scale and low latency to handle billions of...  ...and data science teams to ensure our ML systems are highly performant, scalable, and... 

    Zeta Global

    San Francisco, CA
    1 day ago
  • $220k - $255.8k

     ...Seattle/WA. Team: AI Platform Engineering, WEX Inc. About...  ...to build, deploy, and scale AI-powered experiences...  ...about building systems that make AI a core part...  ...If you're excited by Large Language Models, Agentic...  ...Design and maintain ML pipelines, from data ingestion... 
    Remote work
    Flexible hours

    WEX

    San Francisco, CA
    3 days ago
  • $308k - $423.5k

     ...We are seeking a  Principal AI / ML Engineer to be a  company-level technical...  ...and lead deployment of AI systems (LLM fine-tuning, RLHF, agent...  ...deploying machine learning models at scale, conducting applied AI...  ...background: experience with large-scale data pipelines, ML feature... 
    Work experience placement
    Work at office
    Local area
    Remote work
    Monday to Friday
    Flexible hours
    3 days per week

    Faire Inc

    San Francisco, CA
    2 days ago
  • $202k

    About the Role (Sr AI/ML Engineer : Not Data Scientist) Core Security Engineering...  ...for providing and managing systems, services, and libraries to...  ..., and enforcement at scale. The scope spans across multiple...  ...retraining. Familiarity with large‑scale data/infra systems (Kafka... 
    Full time

    Uber

    San Francisco, CA
    3 days ago
  • $190k - $260k

    A leading AI-driven recruiting platform in San Francisco seeks a Senior/Staff AI/ML Engineer to design and implement innovative AI features. You will develop intelligent search systems and contribute to shaping AI-driven recruitment solutions. The role emphasizes collaboration... 
    Work at office
    Flexible hours
    3 days per week

    SupportFinity™

    San Francisco, CA
    3 days ago
  • $140k - $185k

     ...develop edge machine learning systems that to improve the autonomy and...  ...robots Build scalable ML infrastructure for model training...  ...construction environments Analyze large-scale operational datasets to...  ...based commercial and open-source AI tools into our autonomy stack... 
    Local area
    Flexible hours

    Built Robotics Inc

    San Francisco, CA
    12 hours ago
  • A leading AI technology firm located in San Francisco is seeking a Research Engineer specializing in AI Performance & Kernel Optimization...  ...the performance of large-scale AI systems, optimizing kernels, and collaborating...  ...and experience with ML workloads. Benefits include... 

    Zyphra

    San Francisco, CA
    1 day ago
  • A leading AI company in San Francisco is seeking a skilled ML Infrastructure Engineer to manage and optimize large-scale training systems. In this role, you will design and maintain infrastructure for model training, ensuring efficient GPU/TPU utilization while working... 

    Physical Intelligence

    San Francisco, CA
    2 days ago
  •  ...wide range of environments—from edge devices to large-scale deployments. Our work spans models from ~1B...  ...deployment. Role Overview We are seeking a Staff-level (or higher) AI/ML engineer with expertise in multimodal systems to lead the development of capabilities that... 

    PrismML

    San Francisco, CA
    4 days ago
  • A leading AI research company in San Francisco seeks Senior/Staff Engineers skilled in distributed systems and large-scale ML training. Responsibilities include designing systems optimized for low-bandwidth conditions and implementing robust training strategies. Ideal candidates... 
    Remote work

    Pluralis Research

    San Francisco, CA
    12 hours ago
  •  ...are building next-generation AI creative tools. We are dedicated...  ...We're looking for an ML Engineer to architect and build Krea's...  ...personalization and recommendation systems from scratch. You'll have...  ...points Experience with large-scale data systems and production ML... 
    H1b

    Krea.ai, Inc

    San Francisco, CA
    2 days ago
  • Job Title AI/ML Research Engineer Company Description Generalcatalyst.com - YC W...  ...models and vision-language systems that automate complex manual...  ...reason, plan, and execute at scale. High‑velocity research environment...  ...for GUI automation using large reasoning models and chain‑... 

    Jack & Jill/External ATS

    San Francisco, CA
    4 days ago
  • Sydecar in San Francisco is seeking a Staff Software Engineer to lead complex projects and mentor a team of engineers. The successful candidate...  ...in JavaScript/TypeScript, and expertise in building large-scale systems. You will be responsible for outlining technology... 

    Sydecar

    San Francisco, CA
    1 day ago
  • $181.1k - $318.4k

    Apple Inc. is looking for a Staff ML Infrastructure Engineer in San Francisco to lead pre-training initiatives for cutting-edge foundation models in...  ...over 6 years of experience in building scalable backend systems, be proficient in Python and Go, and possess strong knowledge... 

    Apple Inc.

    San Francisco, CA
    2 days ago
  • $197.3k - $225.1k

     ...Lead AI/ML Engineer (Platform, kubeflow) Overview At Capital One,...  ...responsible and reliable AI systems, changing banking for good. For...  ...including foundation model training, large language model inference,...  ..., throughput — of large scale production AI systems.... 
    Full time
    Part time
    Local area

    Capital One Financial Corp

    San Francisco, CA
    4 days ago
  • $214k - $300k

    Monograph is seeking an engineer to build and improve AI evaluation systems aimed at increasing shipping quality for AI tools. You will enhance scalable eval runners, improve benchmarks, and ensure reliability in distributed systems. Strong engineering fundamentals and... 

    Monograph

    San Francisco, CA
    1 day ago

Do you want to receive more vacancies?

Subscribe and receive similar vacancies to Staff AI/ML Engineer - Large-Scale Systems. Be the first to apply!