ML Ops Engineer Agentic AI Lab (Founding Team)

Fabrion

About the Role ML Ops Engineer — Agentic AI Lab (Founding Team) — Location: San Francisco Bay Area — Type: Full-Time — Compensation: Competitive salary + meaningful equity (founding tier) Backed by 8VC, we're building a world-class team to tackle one of the industry’s most critical infrastructure problems. Our AI Lab is pioneering the future of intelligent infrastructure through open-source LLMs, agent-native pipelines, retrieval-augmented generation (RAG), and knowledge-graph-grounded models. We’re hiring an ML Ops Engineer to be the glue between ML research and production systems — responsible for automating the model training, deployment, versioning, and observability pipelines that power our agents and AI data fabric. You’ll work across compute orchestration, GPU infrastructure, fine-tuned model lifecycle management, model governance, and security. Responsibilities Build and maintain secure, scalable, and automated pipelines for: LLM fine-tuning, SFT, LoRA, RLHF, DPO training RAG embedding pipelines with dynamic updates Model conversion, quantization, and inference rollout Manage hybrid compute infrastructure (cloud, on-prem, GPU clusters) for training and inference workloads using Kubernetes, Ray, and Terraform Containerize models and agents using Docker, with reproducible builds and CI/CD via GitHub Actions or ArgoCD Implement and enforce model governance: versioning, metadata, lineage, reproducibility, and evaluation capture Create and manage evaluation and benchmarking frameworks (e.g. OpenLLM-Evals, RAGAS, LangSmith) Integrate with security and access control layers (OPA, ABAC, Keycloak) to enforce model policies per tenant Instrument observability for model latency, token usage, performance metrics, error tracing, and drift detection Support deployment of agentic apps with LangGraph, LangChain, and custom inference backends (e.g. vLLM, TGI, Triton) Desired Experience Model Infrastructure: 4+ years in MLOps, ML platform engineering, or infra-focused ML roles Deep familiarity with model lifecycle management tools: MLflow, Weights & Biases, DVC, HuggingFace Hub Experience with large model deployments (open-source LLMs preferred): LLaMA, Mistral, Falcon, Mixtral Comfortable with tuning libraries (HuggingFace Trainer, DeepSpeed, FSDP, QLoRA) Familiarity with inference serving: vLLM, TGI, Ray Serve, Triton Inference Server Automation + Infra Proficient with Terraform, Helm, K8s, and container orchestration Experience with CI/CD for ML (e.g. GitHub Actions + model checkpoints) Managed hybrid workloads across GPU cloud (Lambda, Modal, HuggingFace Inference, Sagemaker) Familiar with cost optimization (spot instance scaling, batch prioritization, model sharding) Agent + Data Pipeline Support Familiarity with LangChain, LangGraph, LlamaIndex or similar RAG/agent orchestration tools Built embedding pipelines for multi-source documents (PDF, JSON, CSV, HTML) Integrated with vector databases (Weaviate, Qdrant, FAISS, Chroma) Security & Governance Implemented model-level RBAC, usage tracking, audit trails Integrated with API rate limits, tenant billing, and SLA observability Experience with policy-as-code systems (OPA, Rego) and access layers Preferred Stack LLM Ops : HuggingFace, DeepSpeed, MLflow, Weights & Biases, DVC Infra : Kubernetes (GKE/EKS), Ray, Terraform, Helm, GitHub Actions, ArgoCD Serving : vLLM, TGI, Triton, Ray Serve Pipelines : Prefect, Airflow, Dagster Monitoring : Prometheus, Grafana, OpenTelemetry, LangSmith Security : OPA (Rego), Keycloak, Vault Languages : Python (primary), Bash, optionally Rust or Go for tooling Mindset & Culture Fit Builder's mindset with startup autonomy: you automate what slows you down Obsessive about reproducibility, observability, and traceability Comfortable with a hybrid team of AI researchers, DevOps, and backend engineers Interested in aligning ML systems to product delivery, not just papers Bonus: experience with SOC2, HIPAA, or GovCloud-grade model operations What We’re Looking For Experience: 5+ years as a full stack or backend engineer Experience owning and delivering production systems end-to-end Prior experience with modern frontend frameworks (React, Next.js) Familiarity with building APIs, databases, cloud infrastructure, or deployment workflows at scale Comfortable working in early-stage startups or autonomous roles, prior experience as a founder, founding engineer, or a 0-1 pre-seed startup is a big plus Mindset: Comfortable with ambiguity, eager to prototype and iterate quickly Strong sense of ownership — prefers to build systems rather than wait for tickets Enjoys thinking about architecture, performance, and tradeoffs at every level Clear communicator and pragmatic team player Values equity and impact over prestige or hierarchy Prior startup or founding team experience Why This Role Matters Your work will enable models and agents to be trained, evaluated, deployed, and governed at scale — across many tenants, models, and tasks. This is the backbone of a secure, reliable, and scalable AI-native enterprise system. If you dream about using AI to solve some really hard real world problems – we would love to hear from you. #J-18808-Ljbffr

Apply

Vacancy posted 4 days ago

Similar jobs that could be interesting for youBased on the ML Ops Engineer Agentic AI Lab (Founding Team) in San Francisco, CA vacancy

ML/AI Research Engineer Agentic AI Lab (Founding Team)
...ML/AI Research Engineer — Agentic AI Lab (Founding Team) Location: San Francisco Bay Area Type: Full-Time Compensation: Competitive salary + meaningful equity... ...-correction, multi-agent communication, and agent ops logging Optimization: Strong background in...
Suggested
Full time
Fabrion
San Francisco, CA
3 days ago
Founding ML Ops Engineer Equity & AI Infra
...Fabrion is looking for an ML Ops Engineer for its Agentic AI Lab in San Francisco. Your role will be pivotal in bridging ML research with production systems, focusing on automating model training and deployment. You will establish secure, scalable pipelines and manage...
Suggested
Fabrion
San Francisco, CA
1 hour ago
Founding ML infrastructure Engineer
...The problem we saw Most AI infrastructure is built... ...investors, and are founded by Keegan McCallum, who... ...build the layer that model labs, builders, and research teams ship on top of. Where... ...infrastructure. As our ML Infrastructure and Platform Engineer, you will own the architecture...
Suggested
Flexible hours
Shift work
U-Run
San Francisco, CA
2 hours ago
Founding Forward Deployed Machine Learning Engineer
...Founding Forward Deployed Machine Learning Engineer Most AI is frozen in place - it doesn't adapt to the world. We think that... ...platform falls short, and build the ML solutions that close the gap.... ...Area, a distributed global‑first team, and quarterly off‑sites. Adaption...
Suggested
Flexible hours
Adaption Labs
San Francisco, CA
1 day ago
Data Engineer (Founding Team)
...Data/ETL Engineer (Founding Team) Location: San Francisco Bay Area Type: Full-Time... ...’re building a multi-tenant, AI-native platform where enterprise... ...knowledge models that fuel agentic applications. If you\u2019ve... ...semantic data Collaborate with ML/LLM teams to feed high-...
Suggested
Full time
Fabrion
San Francisco, CA
1 day ago
Senior or Staff ML Systems Engineer, LLMs
Build a Safer World TRM Labs provides AI-powered intelligence... ...and more secure. The AI Engineering Team is chartered with... ...Language Models (LLMs) and agentic systems. Our mission... ...As a Senior or Staff ML Systems Engineer - LLM... ...Understand and implement ML Ops best practices ,...
Remote work
Worldwide
TRM Labs
San Francisco, CA
1 day ago
Founding ML Engineer Peptide Drug Discovery AI
$200k
...Founding ML Engineer San Francisco, on-site, full-time - $200,000 - $500,000 per... ...been crunched. The house AI parsed the PK curves, flagged... ...head of science from a major AI lab walks in. He's got access to... ...is dosed in mice. By 8 PM the team hands you the data. It holds...
Full time
Night shift
Day shift
Afternoon shift
Stealth Deep Tech
San Francisco, CA
1 day ago
Founding Machine Learning Engineer
...Founding ML Architect San Francisco, CA About This... ...architect and own our AI roadmap. This is a foundational... ...'ll lead a growing ML team and work cross-... ...usable tools for chip engineers. You should bring deep... ...Hardware Problem Reasoning, Agentic Systems, Fine-tuning Models...
Brahma Consulting Group
San Francisco, CA
5 days ago
Founding ML Engineer
...Founding Ml Engineer Skills: Python, PyTorch, NLP, LLMs, Information Retrieval, Entity Resolution... ...building the gateway to the internet for AI agents. Our APIs already power hundreds... ...org chart — who reports to whom, what the team structure looks like, how the...
Crustdata (YC F24)
San Francisco, CA
5 days ago
Founding ML Engineer
...Founding Ml Engineer Weave (YC W25) is building the definitive platform for understanding and improving how engineering teams work. We believe the way engineering output is measured today is fundamentally... ...broken and that modern AI can give teams a far more accurate...
Weave, Inc.
San Francisco, CA
3 days ago
Founding Data Scientist / Machine Learning Engineer
...Seeking Founding Data Scientists and Machine Learning Engineers Imagine Multiplying Your... ...You can help product teams iterate faster,... ...next move . Palladio AI is the intelligence... ...inference, forecasting, agentic platforms, and more... ...domains: building ML and AI models to...
Remote work
Palladio AI, Inc
San Francisco, CA
4 days ago
Data Science & ML Ops Engineer
...Data Science & ML Ops Engineer We are seeking a hybrid Data Science & ML Ops Engineer to drive... ...Leverage AutoML tools (e.g., Vertex AI AutoML, H2O Driverless AI) for low-code/... ...explainability) Collaborate with engineering teams to provision containerized environments...
Apolis
San Francisco, CA
3 days ago
Founding Software Engineer, Data Infrastructure
$120k - $160k
...Founding Engineer For Airweave's Data And Infrastructure We'... ...platform that thousands of AI agents depend on. That... ...with the product team, but your focus is on the... ...strategies for large-scale agentic search Orchestrate... ...the world's leading AI labs Competitive salary (...
Airweave (yc X25)
San Francisco, CA
3 days ago
Founding Applied ML Engineer
...Founding Applied ML Engineer Title of Role: Founding Applied ML Engineer Location: San Francisco,... ...We're representing an early-stage AI company that operates at the intersection... ...speech recognition systems. As a founding team member, you will play a crucial role...
Work at office
Recruiting from Scratch
San Francisco, CA
3 days ago
Founding ML Engineer — Real-Time In-Browser AI
...startup in San Francisco seeks founding Machine Learning Engineers (MLEs) to enhance core... ...You will work on low-latency AI solutions in browser... .... This role demands strong ML skills and experience with... ...and character fit on a small team. Ideal candidates must have...
Composite
San Francisco, CA
1 day ago
Founding ML Performance Engineer - Sub-50ms Inference
uRun is seeking an ML Performance Engineer to build high-performance infrastructure for interactive AI. You will write custom CUDA kernels and optimize model inference for... ...role involves working closely with the founding team on critical performance challenges in production...
URun
San Francisco, CA
1 day ago
ML Infrastructure Engineer
...ML Infrastructure Engineer Spectral Labs is a spatial intelligence company building reasoning models... ...from the ground up. Our team is small and talent dense. We have founded quantitative trading firms and... ...the cutting edge of applied AI at Meta, Autodesk Research and...
Spectral Labs
San Francisco, CA
3 days ago
Lead Agentic Data Systems Engineer
$172.5k - $260.1k
...Category Software Engineering About Salesforce Salesforce... ...is the #1 AI CRM, where humans with... ...in the agentic era? You’re in the... ...decision‑making. Our team is composed of Architects... ...company benefits can be found at the following... ...as applicable. #J-18808-Ljbffr Centaur Labs
Shift work
Centaur Labs
San Francisco, CA
2 days ago
Senior ML Systems Engineer, LLM Infra & AI Ops
TRM Labs is looking for a Senior or Staff ML Systems Engineer to focus on building and scaling the technical infrastructure for AI/ML systems in San Francisco. This position involves developing reusable CI/CD workflows and automating model versioning to ensure compliance...
TRM Labs
San Francisco, CA
1 day ago
Founding ML Systems Engineer - End-to-End Infrastructure AI
A leading AI infrastructure firm based in San Francisco is looking for engineers to join their founding core team. You will work directly with the founders to develop AI models that optimize network operations and anticipate failures. This unique position offers the opportunity...
Meter
San Francisco, CA
1 day ago
Applied Audio ML Engineer
About David AI David AI is the first audio... ...with the same rigor AI labs bring to models. Our... ...David AI excels. Founded in 2024 by former Scale AI engineers and operators, David... ...Round Capital. Our team is sharp, humble, ambitious... ...manage the complete ML lifecycle, from...
David AI
San Francisco, CA
2 days ago
Founding AI/ML Engineer ($200-250K + Equity) at Generalcatalyst.com
$200k - $250k
...This is a job that Jill, our AI Recruiter, is recruiting for on behalf of one of... ...step is to speak to Jack. Job Title: Founding AI/ML Engineer Salary: $200-250K + Equity... ...Job Description: Join the founding team at Curium to build Generative Engine Optimization...
Jack and Jill AI
San Francisco, CA
5 days ago
Senior Machine Learning Engineer
$200k - $400k
...data platform to train AI video models. Troveo... ..., and AI research labs, enabling scalable,... ...innovative strategic engineer to help us scale.... ...work across the full ML lifecycle, from structuring... ...and Operations teams to translate ML... ...~ Proficiency in ML ops tools (e.g., MLflow,...
Work experience placement
Troveo AI
San Francisco, CA
2 hours ago
Founding Machine Learning Engineer
...the Role We're looking for founding Machine Learning Engineers (MLEs) to own and improve our... ..., or consumer-focused “AI browsers,” we run AI directly... ...architecture creates unique ML challenges. This is a high-... ...on our small, exceptional team where your work ships directly...
Sleeping nights
Composite.ai
San Francisco, CA
1 hour ago
ML Compiler Engineer
...known as Femtosense—was founded in 2018 by researchers from... ...the Brains in Silicon Lab at Stanford University.... ...pioneered a high-performance AI accelerator integrated... ...will work on a custom ML compiler that transforms... ...with hardware and ML teams to improve system performance...
Femtosense
San Bruno, CA
2 hours ago
Founding ML Engineer for AI-Driven Hedge Fund
...A pioneering hedge fund in San Francisco is seeking a Founding ML Engineer to architect and build machine learning systems for investment decisions. This hands-on role requires 5–10+ years of experience and proficiency in Python and ML frameworks like PyTorch and TensorFlow...
Poesis LLC
San Francisco, CA
1 hour ago
ML Engineer Tilde Research
$150k - $350k
...ML Engineer — Tilde Research Location: San Francisco, CA (Onsite) Compensation: $150,000... ...Research Tilde Research is a frontier AI research lab focused on mechanistic understanding of... ...operates with a small, highly selective team at the intersection of research and engineering...
David Joseph & Company
San Francisco, CA
1 hour ago
Founding Machine Learning Engineer
$175k - $300k
...Trove Trove is developing an AI associate for financial firms -... ...density: Be a part of a great founding team in SF. Shivaal Roy (CTO) was a founding engineer at Glean ($100M+ ARR today) and... ...Collaborate with the top AI labs and elite financial firms to design...
TROVE
San Francisco, CA
1 hour ago
Senior ML Compiler Engineer
...as Femtosense—was founded in 2018 by researchers... ...Brains in Silicon Lab at Stanford... ...a high-performance AI accelerator integrated... ...components of our ML compiler, owning critical... ...with hardware teams on co‑design of abstractions... ..., or performance engineering Masters or PhD in...
Femtosense
San Bruno, CA
1 day ago
Machine Learning Engineer Large Language Models, Generative AI & Agentic Systems
$147.4k - $272.1k
...Machine Learning Engineer — Large Language Models, Generative AI & Agentic Systems San Francisco Bay Area, California, United... ...and AI The Intelligence Platform team empowers clients across Apple’s operating... ...matters most is curiosity, strong ML fundamentals, and the ability to...
Relocation
Apple
San Francisco, CA
1 hour ago

Do you want to receive more vacancies?

Subscribe and receive similar vacancies to ML Ops Engineer Agentic AI Lab (Founding Team). Be the first to apply!