Staff AI/ML Engineer - Large-Scale Systems

PrismML

We build high-performance foundation models designed to run efficiently across a wide range of environments—from edge devices to large-scale deployments. Our work spans models from ~1B to 100B+ parameters across LLMs, diffusion models, and other modalities, with a strong focus on scalable training, efficient inference, and real-world deployment. Role Overview We are seeking a Staff-level (or higher) AI/ML engineer to lead large-scale model training efforts. This role combines hands-on ownership of large training runs with responsibility for setting technical direction, mentoring engineers, and improving model quality and system performance across the organization. Responsibilities You will design, implement, and optimize distributed training systems for large-scale models across all major training phases. Core responsibilities include: Leading model development across pretraining, fine-tuning, and post-training stages Designing and improving data pipelines, including curation, filtering, deduplication, and dataset composition Improving training efficiency, scalability, and reliability across large distributed systems Optimizing model performance with respect to convergence, throughput, memory usage, and stability Translating cutting-edge research into robust, production-ready systems Providing technical leadership through mentoring, design reviews, and cross-functional collaboration Basic Qualifications You bring deep experience in large-scale AI/ML systems and strong fundamentals in modern model training: 8–10+ years of experience in machine learning or AI or strong publication record Strong Python programming skills with production-quality code Hands-on experience training large-scale models (multi-billion parameters) Solid understanding of optimization, distributed training, and training dynamics Experience with modern model training workflows (e.g., pretraining, fine-tuning, reinforcement learning approaches) Proven ability to mentor and lead other AI/ML engineers Preferred Qualifications You have additional experience aligned with large-scale, high-performance AI/ML systems: Experience training very large models (tens to hundreds of billions of parameters) Familiarity with modern accelerator hardware (e.g., GPUs or TPUs) and distributed training frameworks Experience improving system performance, resource utilization, and training efficiency Exposure to deployment environments with real-world constraints (e.g., latency, cost, or hardware limitations) Experience with advanced optimization techniques and scaling strategies Contributions to research, publications, or open-source AI/ML systems Ideal Candidate Profile You have led or significantly contributed to training large models end-to-end, understand common failure modes in large-scale training systems, and know how to debug and improve them. You care about building efficient, reliable systems that work in real-world settings, enjoy mentoring others, and thrive at the intersection of research, engineering, and product. #J-18808-Ljbffr PrismML

Apply

Vacancy posted 4 days ago

Similar jobs that could be interesting for youBased on the Staff AI/ML Engineer - Large-Scale Systems in San Francisco, CA vacancy

Staff AI/ML Engineer: Large-Scale Training Systems
PrismML is seeking a Staff-level AI/ML engineer to lead large-scale model training efforts. This role focuses on technical direction, mentoring engineers, and enhancing model quality and system performance. The ideal candidate will design, implement, and optimize distributed...
Suggested
PrismML
San Francisco, CA
12 hours ago
Staff Backend Architect for Large-Scale, Real-Time Systems
$230k - $310k
...company in San Francisco is seeking a Staff Engineer to lead critical backend initiatives. This... ...architecting scalable back-end systems and mentoring engineers while ensuring... ...expertise in event streaming systems and large-scale APIs. The position offers a competitive...
Suggested
Work at office
Remote work
Gamma
San Francisco, CA
4 days ago
Staff ML Platform Engineer - Large Scale Training (LLMOps/MLOps)
...Staff ML Platform Engineer – Large Scale Training (LLMOps/MLOps) We're TrueFoundry, and we're building the foundational infrastructure for production AI systems. We're looking for a Staff ML Platform Engineer – Large Scale Training (LLMOps/MLOps) to join the team....
Suggested
Flexible hours
TrueFoundry
San Francisco, CA
1 day ago
ML Infrastructure Engineer — Large-Scale AI Systems
A leading AI research organization in San Francisco seeks an Infrastructure Engineer to design and maintain large distributed ML training and inference clusters. The ideal candidate will have a strong grasp of optimizing training workloads and experience with distributed...
Suggested
Causal Labs
San Francisco, CA
4 days ago
AI/ML Engineer(RL & Physical Systems)
...AI/ML Engineer (RL & Physical Systems) FLUIX is building the AI Operating System for data centers. We deploy autonomous AI that optimizes, predicts... ...touch real chillers, real cooling loops, and real megawatt-scale infrastructure. Who You'll Work Closely With Abhi...
Suggested
Weekend work
Fluix AI
San Francisco, CA
2 days ago
Machine Learning Engineer — Large Language Models, Generative AI & Agentic Systems
$147.4k - $272.1k
Machine Learning Engineer — Large Language Models, Generative AI & Agentic Systems San Francisco Bay Area, California, United States... ...high-quality inferences at scale! Description We are in search of... ...matters most is curiosity, strong ML fundamentals, and the ability to...
Relocation
Apple Inc.
San Francisco, CA
12 hours ago
GTM AI Systems Engineer — Automate Marketing at Scale
A leading AI contracting platform is seeking an AI-native builder for its GTM AI & Systems team in San Francisco. This role focuses on replacing manual marketing tasks with... ..., and collaborating with various teams to scale solutions. Candidates should have experience...
Ironclad Inc.
San Francisco, CA
12 hours ago
Staff AI Platform Engineer: Scale LLM‑Driven Systems
A leading AI Time platform provider in Los Angeles seeks a Staff AI Platform Engineer to enhance its AI platform. The role requires deep experience in building AI/ML platforms at scale and strong backend systems knowledge. Responsibilities include owning AI challenges,...
Work at office
Laurel
San Francisco, CA
1 day ago
AI/ML Engineer
$180k - $260k
...AI And ML Engineer Profound is on a mission to help companies understand and control their AI presence. As an AI and ML Engineer, you will design, build, and ship large scale NLP and LLM systems that power classification, ranking, clustering, topic discovery, and content...
Work at office
Profound
San Francisco, CA
2 days ago
Early Career AI/ML Engineer
...Brain Co. is an applied AI startup co-founded by... ...governments, healthcare systems, and critical industries... ...The Role As an AI/ML Engineer at Brain Co., you will... ...verticals. Optimize and Scale: Build scalable data... ..., and society at large. Engage with Leaders...
Worldwide
Brainco
San Francisco, CA
2 days ago
AI/ML Engineer
...Series A-funded agentic AI company building the... ...our Silicon Valley engineering team — a small,... ...FAISS, Weaviate) at scale. • Implement agentic systems using LangGraph, LlamaIndex... ...AI features into large-scale data pipelines... ...• 3–5 years of AI/ML engineering; minimum...
Visa sponsorship
HOP
San Francisco, CA
2 days ago
Senior AI / ML Engineer
...Senior AI / ML Engineer We are seeking a proactive, hands-on Senior ML/... ...the frontier of intelligent systems within the sector of advanced... ...based on stakeholder feedback to scale models for production... ...specifically those utilizing Large Language Models (LLMs) and transformer...
Implaion Recruiting
San Francisco, CA
3 days ago
Ai/ml phthon engineer
$170k - $216k
...Job Description: ai/ml phthon engineer The Perception team builds the system which learns the spatial-temporal representation and their semantic... ...for efficiently and continuously learning from large scale real-world data, to (2) develop models and model...
Full time
Remote work
ESR Healthcare
San Francisco, CA
12 hours ago
Applied AI / ML Engineer
...About the job Applied AI / ML Engineer About Us Catalyst Labs is... ...Established tech companies: scaling their ML infrastructure, recommendation systems, and data platforms, and Enterprise... ...of resources and reach of a large multi national firm. Roles &...
Full time
Visa sponsorship
Catalyst Labs, LLC
San Francisco, CA
4 days ago
Principal AI/ML Engineer - AdTech
$300k - $400k
...Principal AI/ML Engineer - AdTech San Francisco, California, United States Zeta Global... ...creative content generation, operating at large scale and low latency to handle billions of... ...and data science teams to ensure our ML systems are highly performant, scalable, and...
Zeta Global
San Francisco, CA
1 day ago
Sr. Staff AI/ML Engineer
$220k - $255.8k
...Seattle/WA. Team: AI Platform Engineering, WEX Inc. About... ...to build, deploy, and scale AI-powered experiences... ...about building systems that make AI a core part... ...If you're excited by Large Language Models, Agentic... ...Design and maintain ML pipelines, from data ingestion...
Remote work
Flexible hours
WEX
San Francisco, CA
3 days ago
Principal Applied AI / ML Engineer
$308k - $423.5k
...We are seeking a Principal AI / ML Engineer to be a company-level technical... ...and lead deployment of AI systems (LLM fine-tuning, RLHF, agent... ...deploying machine learning models at scale, conducting applied AI... ...background: experience with large-scale data pipelines, ML feature...
Work experience placement
Work at office
Local area
Remote work
Monday to Friday
Flexible hours
3 days per week
Faire Inc
San Francisco, CA
2 days ago
Senior AI/ML Engineer
$202k
About the Role (Sr AI/ML Engineer : Not Data Scientist) Core Security Engineering... ...for providing and managing systems, services, and libraries to... ..., and enforcement at scale. The scope spans across multiple... ...retraining. Familiarity with large‑scale data/infra systems (Kafka...
Full time
Uber
San Francisco, CA
3 days ago
Senior AI/ML Engineer — Build Hiring AI That Scales
$190k - $260k
A leading AI-driven recruiting platform in San Francisco seeks a Senior/Staff AI/ML Engineer to design and implement innovative AI features. You will develop intelligent search systems and contribute to shaping AI-driven recruitment solutions. The role emphasizes collaboration...
Work at office
Flexible hours
3 days per week
SupportFinity™
San Francisco, CA
3 days ago
AI & ML Engineer
$140k - $185k
...develop edge machine learning systems that to improve the autonomy and... ...robots Build scalable ML infrastructure for model training... ...construction environments Analyze large-scale operational datasets to... ...based commercial and open-source AI tools into our autonomy stack...
Local area
Flexible hours
Built Robotics Inc
San Francisco, CA
12 hours ago
AI Performance & Kernel Engineer for Frontier-Scale ML
A leading AI technology firm located in San Francisco is seeking a Research Engineer specializing in AI Performance & Kernel Optimization... ...the performance of large-scale AI systems, optimizing kernels, and collaborating... ...and experience with ML workloads. Benefits include...
Zyphra
San Francisco, CA
1 day ago
ML Training Infra Engineer — JAX/TPU & Scale
A leading AI company in San Francisco is seeking a skilled ML Infrastructure Engineer to manage and optimize large-scale training systems. In this role, you will design and maintain infrastructure for model training, ensuring efficient GPU/TPU utilization while working...
Physical Intelligence
San Francisco, CA
2 days ago
Staff AI/ML Engineer - Edge & Consumer AI
...wide range of environments—from edge devices to large-scale deployments. Our work spans models from ~1B... ...deployment. Role Overview We are seeking a Staff-level (or higher) AI/ML engineer with expertise in multimodal systems to lead the development of capabilities that...
PrismML
San Francisco, CA
4 days ago
Senior Distributed ML Systems Engineer (Remote Equity)
A leading AI research company in San Francisco seeks Senior/Staff Engineers skilled in distributed systems and large-scale ML training. Responsibilities include designing systems optimized for low-bandwidth conditions and implementing robust training strategies. Ideal candidates...
Remote work
Pluralis Research
San Francisco, CA
12 hours ago
ML Engineer - Personalization & Recommendation Systems
...are building next-generation AI creative tools. We are dedicated... ...We're looking for an ML Engineer to architect and build Krea's... ...personalization and recommendation systems from scratch. You'll have... ...points Experience with large-scale data systems and production ML...
H1b
Krea.ai, Inc
San Francisco, CA
2 days ago
AI/ML Research Engineer at Generalcatalyst.com
Job Title AI/ML Research Engineer Company Description Generalcatalyst.com - YC W... ...models and vision-language systems that automate complex manual... ...reason, plan, and execute at scale. High‑velocity research environment... ...for GUI automation using large reasoning models and chain‑...
Jack & Jill/External ATS
San Francisco, CA
4 days ago
Staff Software Engineer — Lead Large-Scale Fintech Platform
Sydecar in San Francisco is seeking a Staff Software Engineer to lead complex projects and mentor a team of engineers. The successful candidate... ...in JavaScript/TypeScript, and expertise in building large-scale systems. You will be responsible for outlining technology...
Sydecar
San Francisco, CA
1 day ago
Staff ML Infra Engineer: Large-Scale Pretraining & MLOps
$181.1k - $318.4k
Apple Inc. is looking for a Staff ML Infrastructure Engineer in San Francisco to lead pre-training initiatives for cutting-edge foundation models in... ...over 6 years of experience in building scalable backend systems, be proficient in Python and Go, and possess strong knowledge...
Apple Inc.
San Francisco, CA
2 days ago
Lead AI/ML Engineer (Platform, kubeflow)
$197.3k - $225.1k
...Lead AI/ML Engineer (Platform, kubeflow) Overview At Capital One,... ...responsible and reliable AI systems, changing banking for good. For... ...including foundation model training, large language model inference,... ..., throughput — of large scale production AI systems....
Full time
Part time
Local area
Capital One Financial Corp
San Francisco, CA
4 days ago
AI Evaluation Systems Engineer — Scale & Tooling
$214k - $300k
Monograph is seeking an engineer to build and improve AI evaluation systems aimed at increasing shipping quality for AI tools. You will enhance scalable eval runners, improve benchmarks, and ensure reliability in distributed systems. Strong engineering fundamentals and...
Monograph
San Francisco, CA
1 day ago

Do you want to receive more vacancies?

Subscribe and receive similar vacancies to Staff AI/ML Engineer - Large-Scale Systems. Be the first to apply!