Staff ML Ops Engineer

Albert Invent

Albert's mission is to digitalize the world of chemistry. Using data and machine learning, Albert enables R&D organizations to dramatically accelerate the invention of new materials. Our platform helps scientists and engineers build structured data foundations, digitize formulation and testing workflows, and apply AI to innovate faster, smarter, and at scale.

About the role

As our Backend & Infrastructure Engineer, you will architect and build the core systems that power everything our AI/ML team delivers-the APIs, infrastructure, and distributed systems that make intelligent capabilities possible at scale. This is a foundational role: you'll shape how AI gets built and shipped here.

We are seeking a highly motivated and talented individual with deep expertise in Python backend development, Kubernetes, and distributed systems. You'll be embedded with ML engineers and researchers, building robust systems that turn ambitious AI ideas into production realities-whether that's powering agent-based workflows, scaling inference, or enabling scientific computing pipelines. The infrastructure you build will directly enable researchers at the world's largest chemical and materials companies to leverage AI in ways that weren't possible before-accelerating discovery, enabling inverse design of novel materials, and transforming how science gets done.

What you'll do

Infrastructure & Kubernetes:

Design, deploy, and maintain Kubernetes infrastructure supporting AI/ML workloads

Manage containerized services, autoscaling, networking, and resource optimization

Backend Development:

Design and build high-performance Python APIs and services using FastAPI or similar frameworks

Architect backend systems for scalability, reliability, and low latency

Build integrations between AI/ML systems and the broader Albert platform

Distributed Systems:

Build and operate distributed systems that handle compute-intensive and high-throughput workloads

Design for fault tolerance, graceful degradation, and horizontal scalability

Implement async workflows, job queues, and task orchestration as needed

Data Infrastructure:

Architect and maintain data pipelines and storage systems supporting AI/ML workflows

Work with vector databases, caches, and other data stores as required by ML systems

Ensure efficient data access patterns for training and inference workloads

Reliability & Operations:

Implement observability including logging, metrics, tracing, and alerting

Own system reliability-troubleshoot issues, conduct post-mortems, and continuously improve

Design CI/CD pipelines and promote automation best practices

Implement infrastructure-as-code practices using Terraform, Helm, ArgoCd, Pulumi, or similar tools

Collaboration:

Partner closely with ML engineers to understand requirements and deliver production-ready infrastructure

Translate ML prototypes and research code into scalable, maintainable systems

Contribute to technical decisions that shape the team's architecture

You will have

Deep expertise in Python backend development and distributed systems

Strong Kubernetes and cloud infrastructure experience

A builder's mindset-you want to create foundational systems that others build on

Genuine interest in science and technology; curiosity about how your work enables scientific discovery

A commitment to building systems that are reliable, maintainable, and scalable

Key competencies

A degree in Computer Science or a related field with 7+ years of industry experience (Bachelor's) or 5+ years (Master's or PhD) in software engineering

Experience supporting AI/ML teams or deploying ML systems in production

Experience with GPU workloads and scheduling

Advanced proficiency in Python including async programming and performance optimization

Deep experience with Kubernetes-cluster management, networking, autoscaling, and troubleshooting

Strong background in distributed systems and microservices architecture

Experience with cloud platforms (AWS, GCP, or Azure) and infrastructure-as-code

Proficiency in REST API development using FastAPI, Flask, or similar

Experience with containerization and CI/CD pipelines

Track record of operating production systems at scale

Preferred/Bonus Points

Familiarity with scientific computing or research environments

Background in or curiosity about chemistry, materials science, or related fields

Familiarity with data engineering tools (Airflow, Dagster, or similar)

Experience with vector databases or search infrastructure

Expertise in observability tools (Prometheus, Grafana, Datadog)

Experience with message queues and event-driven architectures (Kafka, Redis, RabbitMQ)

Contributions to open-source projects

Experience mentoring engineers

Why Albert?

We have a huge impact. Albert is a growing team with a big reach. Our Platform facilitates the invention of materials for tens of thousands of companies and hundreds of thousands of applications - from coatings used on rockets to adhesives used in electric vehicles to 3D printed medical devices. We love distributed teams. Albert's home-base is in the California Bay Area, but we have multiple offices and employees sprinkled around the globe. In fact, over 50% of our employees work outside of California! An international remote culture is in our DNA. We care about you. Albert works hard to create a positive environment for our employees, and we think your life outside of work is important too. We work hard and we play hard. We value diversity. Growing and maintaining our inclusive and diverse team matters to us. We are committed to being a company where our employees feel comfortable bringing their authentic selves to work and have the ability to be successful - every day. We're always looking for humble, sharp, and creative folks to join the Albert team. If you think you might be a fit please apply!

Apply

Vacancy posted 19 hours ago

Similar jobs that could be interesting for youBased on the Staff ML Ops Engineer in Oakland, CA vacancy

Data Science & ML Ops Engineer
...Data Science & ML Ops Engineer Location: Bay Area, CA Tax Term (W2, C2C): W2, C2C We are seeking a hybrid Data Science & ML Ops Engineer to drive the full lifecycle of machine learning solutions—from data exploration and model development to scalable deployment and...
Suggested
Apolis
San Francisco, CA
4 days ago
ML Ops Engineer — Agentic AI Lab (Founding Team)
About the Role ML Ops Engineer — Agentic AI Lab (Founding Team) — Location: San Francisco Bay Area — Type: Full-Time — Compensation: Competitive salary + meaningful equity (founding tier) Backed by 8VC, we're building a world-class team to tackle one of the industry’s most...
Suggested
Full time
Fabrion
San Francisco, CA
4 days ago
ML Ops Engineer — Equity & AI Infra Architect
A pioneering AI company in the San Francisco Bay Area is seeking an ML Ops Engineer to automate model training, deployment, and governance processes. The ideal candidate will have extensive MLOps experience and be proficient in tools like Kubernetes and Terraform. This...
Suggested
Fabrion
San Francisco, CA
4 days ago
ML Ops / Dev Ops Engineer
...transformers and spatial models run efficiently on both cloud and edge compute resources. Learn more at About the Role As an ML / DevOps Engineer, you will play a pivotal role in advancing our infrastructure, scaling enterprise deployment workflows, and refining...
Suggested
Work at office
Zensors
San Francisco, CA
7 days ago
ML Ops Engineer
...MLOps Engineer At Hayden AI, we are on a mission to harness the power of computer vision to transform the way transit systems and other... ...machine learning models and drive best practices across the ML lifecycle. You will play a key role in shaping the architecture...
Suggested
Work at office
3 days per week
Hayden AI
San Francisco, CA
20 days ago
Senior ML Ops & Data Engineer - Cloud GPUs & Production ML
...technology company based in Berkeley is seeking a Data Pipeline Engineer to design and maintain scalable data pipelines for their logistics... ...experience, proficiency in GCP/AWS/Azure, and is familiar with ML frameworks like Pytorch. This role requires expertise in setting...
Emancro
Berkeley, CA
19 hours ago
ML/Dev Ops Systems Engineer
...Zensors Infrastructure Engineer Zensors is the spatial intelligence platform for the physical world. Our AI platform provides real-time... .... Collaboration & Scaling: Collaborate deeply with ML engineers to ensure validation readiness for new models and take...
Work at office
Zensors
San Francisco, CA
9 days ago
Staff ML Engineer: Listings & Host Tools Data AI
...leading hospitality platform in San Francisco is seeking a Staff Machine Learning Engineer to enhance guest and host experiences through cutting-edge... ..., and contribute to improving product experiences using ML. Ideal candidates should have extensive experience in applied...
airbnb, Inc.
San Francisco, CA
4 days ago
Senior Cloud/ML Ops Engineer
$250k - $325k
...raised our Series B and have grown 800% over the last 12 months. Engineering at Ivo Engineers at Ivo are inventors. Ivo was first-to-... ...(dev → staging → prod) Design strategies to isolate ML vs API workloads while optimizing for cost, performance, and reliability...
Contract work
Work at office
Remote work
IVO Inc
San Francisco, CA
4 days ago
Senior Machine Learning Engineer
...will be a part of a small, production-minded ML team based in Orange County/Oakland. You'll collaborate with other engineers and researchers to develop, evaluate, and help... ...engineers on APIs, containers, and CI, and with ops/labeling teams on edge cases and feedback...
Kinetic Corporation
Oakland, CA
3 days ago
ML Infra Engineer (Distributed Training)
ML Systems Engineer - Robotics & AI We are building the full-stack foundation for the next generation... ...manufacturing scale-up. We are hiring a Staff/Principal ML Systems Engineer to own... ...performance work including CUDA or Triton, fused ops, and compiler/graph capture. Experience...
Maxwell Bond
San Francisco, CA
4 days ago
Staff ML Engineer, Product
$210k - $260k
...THE TEAM Machine Learning Engineers at Rocket Money further our... ...how to support strategy with ML and AI powered user experiences... ...products on end users. At the Staff level, Machine Learning Engineers... ...evaluation and ML Ops frameworks that enable systematic...
Work at office
Local area
Rocket Money
San Francisco, CA
19 hours ago
Staff Applied ML Engineer
$151k - $177.5k
...transition to clean energy. To create this future, our team is building a better lithium-ion battery from the inside out today. We engineer and manufacture ground-breaking battery materials that significantly increase the energy density of batteries, while reducing...
Sila
Alameda, CA
29 days ago
Senior AI/ML Engineer: Python & Scientific Computing
$175k - $250k
...THE ROLE Swayable is seeking a Senior Engineer blending Python software development expertise... ..., data scientists, and research staff to build new features and solve novel problems... ...the constantly evolving toolset for ML and AI Ops. * You are knowledgeable about software...
Swayable
San Francisco, CA
8 days ago
Senior/Staff ML Engineer, Performance Optimization
...and bleeding-edge part of our engine. You'll be working on making AI... ...You think the current state of ML deployment could be way better... ...only categories: dev, arts, prod, ops, etc (and no, there is no one... ...the title of Member of Technical Staff, it’s long and silly for a job...
Comfy
San Francisco, CA
19 hours ago
Senior ML Operations Engineer
$118k - $169k
...platforms, tools, and processes that take our models from ideas to production models, serving predictions in real time. The Sr. ML Ops Engineer will partner with our Data Science, Data Product Management, Product Engineering, and Data Platform teams to create and support...
Hourly pay
Work experience placement
Work at office
Immediate start
Visa sponsorship
Work visa
Flexible hours
Early Warning Services, LLC
San Francisco, CA
3 days ago
Senior ML Systems Engineer - LLM Infra & Governance
...company focused on blockchain solutions is seeking a Senior ML Systems Engineer. In this role, you will build reusable workflows, automate model... ...with scalable infrastructure, and a deep understanding of ML Ops best practices. The position offers a competitive salary...
TRM Labs
San Francisco, CA
2 days ago
Robotics ML Performance Engineer
I did my part and supported the Regular Toilet is seeking a skilled ML Performance Engineer to enhance robotic solutions. This role involves owning product performance, analyzing robot behavior, and collaborating closely with various teams to drive improvements in our universal...
I did my part and supported the Regular Toilet
Emeryville, CA
19 hours ago
ML Research Engineer - Hardware Codesign
...We’re seeking a Research-Hardware Codesign Engineer to operate at the boundary between model research... ...of the resulting mapping of tensor ops to functional units. Working knowledge of PyTorch or JAX; experience in large ML codebases is a plus. Practical understanding...
Relocation package
3 days per week
OpenAI
San Francisco, CA
4 days ago
Staff Machine Learning Engineer, Listings and Host Tools Data and AI
Staff Machine Learning Engineer, Listings and Host Tools Data and AI Airbnb was born in 2007 when two hosts welcomed three guests to their San Francisco... ...Listing Marketplace Intelligence Machine Learning (ULM-ML) team: The ULM-ML team supports host personalization...
Work experience placement
airbnb, Inc.
San Francisco, CA
4 days ago
Senior Staff Machine Learning Engineer, Guest & Host
Senior Staff Machine Learning Engineer, Communication & Connectivity Remote - USA Airbnb was born in 2007 when two hosts welcomed three guests to their... ...concepts into impactful realities. You will be leading ML, Data and Product engineers to drive ML applications. A...
Work experience placement
Remote work
airbnb, Inc.
San Francisco, CA
19 hours ago
ML Software Engineer
...team that already has models. Job Description We’re hiring an ML Software Engineer (On-Device AI Model Optimizations) to drive the end-to-end... ...runtime/SDK engineers to resolve compiler/runtime constraints (ops coverage, precision, layout, scheduling). Work with product/...
Full time
CAPSA
San Francisco, CA
19 hours ago
ML/AI Engineers
...Role: ML/AI Engineers (This role is open to US Citizens, Green Card holders, GC-EAD only. We do not sponsor visas.) Summary: Adidev is looking for an adept Machine Learning Engineer to take the helm in deploying advanced machine learning models, with a special...
Remote work
Visa sponsorship
Relocation package
Adidev Technologies Inc
Oakland, CA
19 hours ago
Staff AI/ML Engineer - Financial Ops Automation
...technology company in New York is seeking a Member of Technical Staff - AI/ML to design and deploy AI-powered systems that address... ...matching and payment reconciliation, requiring strong software engineering experience, especially in applied AI/ML. Candidates should be...
Flexible hours
Stuut
San Francisco, CA
7 days ago
ML Engineer, Discovery Applications
$150k - $200k
...first commercially available AI Co-Scientist. It is a discovery engine that transforms messy biological data into insights in minutes. Scientists... ...to patient outcomes. ABOUT THE ROLE We are hiring an ML Engineer, Discovery Applications to build the high level, end-to-...
Work at office
Mithrl
San Francisco, CA
22 days ago
ML Engineer, Biological Analysis & Simulation
$150k - $200k
...first commercially available AI Co-Scientist. It is a discovery engine that transforms messy biological data into insights in minutes. Scientists... ...to patient outcomes. ABOUT THE ROLE We are hiring an ML Engineer, Analysis and Simulation to build the core analytical...
Work at office
Mithrl
San Francisco, CA
22 days ago
Senior ML Engineer
...Senior ML Engineer Highlight is building a shared intelligence layer for the modern workforce. Highlight unifies context across every person and tool on your team, seamlessly bridging information silos. It evolves with your organization to proactively route knowledge...
Work at office
Relocation
Relocation package
Flexible hours
Highlight AI
San Francisco, CA
3 days ago
Senior ML Compiler Engineer
$128.7k - $261.3k
...approaches to model export, kernel development, and performance engineering so that every cycle on our accelerators translates into better... ...tooling that makes that path fast, reliable, and effortless for ML engineers across the AV organization to compile their models....
Local area
Work from home
Relocation package
Flexible hours
General Motors
San Francisco, CA
4 days ago
Machine Learning Engineer
$1,000 per month
...Machine Learning Engineer Elicit is building the reasoning layer for science and decision... ...research, we are laying the groundwork for ML systems that are systematic, transparent,... ...Senior (L4): $230-260K + equity Expert/Staff (L5): $255-340K + significant equity...
Work at office
Remote work
Home office
Relocation package
Flexible hours
Elicit
Oakland, CA
4 days ago
Gentoro | Senior ML Engineer
...governance, maintain auditability, and deliver reliable outcomes at scale. About the Role We are looking for a visionary Senior ML Engineer who will bridge the gap between high-level architecture and hands-on execution, specifically focusing on simplifying enterprise...
Shift work
Palm Venture Studios
San Francisco, CA
3 days ago

Do you want to receive more vacancies?

Subscribe and receive similar vacancies to Staff ML Ops Engineer. Be the first to apply!