Staff ML Ops Engineer

Albert Invent

Backend & Infrastructure Engineer

Albert's mission is to digitalize the world of chemistry. Using data and machine learning, Albert enables R&D organizations to dramatically accelerate the invention of new materials. Our platform helps scientists and engineers build structured data foundations, digitize formulation and testing workflows, and apply AI to innovate faster, smarter, and at scale.

About the Role

As our Backend & Infrastructure Engineer, you will architect and build the core systems that power everything our AI/ML team delivers—the APIs, infrastructure, and distributed systems that make intelligent capabilities possible at scale. This is a foundational role: you'll shape how AI gets built and shipped here.

We are seeking a highly motivated and talented individual with deep expertise in Python backend development, Kubernetes, and distributed systems. You'll be embedded with ML engineers and researchers, building robust systems that turn ambitious AI ideas into production realities—whether that's powering agent-based workflows, scaling inference, or enabling scientific computing pipelines. The infrastructure you build will directly enable researchers at the world's largest chemical and materials companies to leverage AI in ways that weren't possible before—accelerating discovery, enabling inverse design of novel materials, and transforming how science gets done.

What You'll Do

Infrastructure & Kubernetes:

Design, deploy, and maintain Kubernetes infrastructure supporting AI/ML workloads
Manage containerized services, autoscaling, networking, and resource optimization

Backend Development:

Design and build high-performance Python APIs and services using FastAPI or similar frameworks
Architect backend systems for scalability, reliability, and low latency
Build integrations between AI/ML systems and the broader Albert platform

Distributed Systems:

Build and operate distributed systems that handle compute-intensive and high-throughput workloads
Design for fault tolerance, graceful degradation, and horizontal scalability
Implement async workflows, job queues, and task orchestration as needed

Data Infrastructure:

Architect and maintain data pipelines and storage systems supporting AI/ML workflows
Work with vector databases, caches, and other data stores as required by ML systems
Ensure efficient data access patterns for training and inference workloads

Reliability & Operations:

Implement observability including logging, metrics, tracing, and alerting
Own system reliability—troubleshoot issues, conduct post-mortems, and continuously improve
Design CI/CD pipelines and promote automation best practices
Implement infrastructure-as-code practices using Terraform, Helm, ArgoCd, Pulumi, or similar tools

Collaboration:

Partner closely with ML engineers to understand requirements and deliver production-ready infrastructure
Translate ML prototypes and research code into scalable, maintainable systems
Contribute to technical decisions that shape the team's architecture

You Will Have

Deep expertise in Python backend development and distributed systems
Strong Kubernetes and cloud infrastructure experience
A builder's mindset—you want to create foundational systems that others build on
Genuine interest in science and technology; curiosity about how your work enables scientific discovery
A commitment to building systems that are reliable, maintainable, and scalable

Key Competencies

A degree in Computer Science or a related field with 7+ years of industry experience (Bachelor's) or 5+ years (Master's or PhD) in software engineering
Experience supporting AI/ML teams or deploying ML systems in production
Experience with GPU workloads and scheduling
Advanced proficiency in Python including async programming and performance optimization
Deep experience with Kubernetes—cluster management, networking, autoscaling, and troubleshooting
Strong background in distributed systems and microservices architecture
Experience with cloud platforms (AWS, GCP, or Azure) and infrastructure-as-code
Proficiency in REST API development using FastAPI, Flask, or similar
Experience with containerization and CI/CD pipelines
Track record of operating production systems at scale

Preferred/Bonus Points

Familiarity with scientific computing or research environments
Background in or curiosity about chemistry, materials science, or related fields
Familiarity with data engineering tools (Airflow, Dagster, or similar)
Experience with vector databases or search infrastructure
Expertise in observability tools (Prometheus, Grafana, Datadog)
Experience with message queues and event-driven architectures (Kafka, Redis, RabbitMQ)
Contributions to open-source projects
Experience mentoring engineers

Why Albert?

We have a huge impact. Albert is a growing team with a big reach. Our Platform facilitates the invention of materials for tens of thousands of companies and hundreds of thousands of applications - from coatings used on rockets to adhesives used in electric vehicles to 3D printed medical devices. We love distributed teams. Albert's home-base is in the California Bay Area, but we have multiple offices and employees sprinkled around the globe. In fact, over 50% of our employees work outside of California! An international remote culture is in our DNA. We care about you. Albert works hard to create a positive environment for our employees, and we think your life outside of work is important too. We work hard and we play hard. We value diversity. Growing and maintaining our inclusive and diverse team matters to us. We are committed to being a company where our employees feel comfortable bringing their authentic selves to work and have the ability to be successful -- every day. We're always looking for humble, sharp, and creative folks to join the Albert team. If you think you might be a fit please apply!

Apply

Vacancy posted 10 hours ago

Similar jobs that could be interesting for youBased on the Staff ML Ops Engineer in United States vacancy

Senior ML Ops Engineer
$190k - $210k
...Senior ML Ops Engineer | High-Growth AI Startup | $190,000 - $210,000 | Boston (Hybrid) A venture-backed AI startup is expanding its engineering team and hiring a Senior MLOps Engineer to build and scale production-grade AI systems for healthcare automation. This...
Suggested
Local area
Flexible hours
Franklin Fitch
Boston, MA
10 hours ago
ML Ops Engineer - Clearance Required
$110.08k - $185.14k
...Machine Learning Operations Engineer LMI is seeking a Machine Learning Operations Engineer... ...support the development of cutting-edge AI/ML solutions in collaboration with the Army's... ...needs for the United States Army. The ML Ops Engineer will work at the intersection of...
Suggested
Contract work
Remote work
LMI
United States
1 day ago
Senior ML Ops Engineer
...Senior ML Ops Engineer Intuition Machines uses AI/ML to build enterprise security products. We apply our research to systems that serve hundreds of millions of people, with a team distributed around the world. You are probably familiar with our best-known product, the...
Suggested
Remote work
Flexible hours
Intuition Machines, Inc.
United States
1 day ago
ML Ops Engineer
...Job Title : ML Ops Engineer Location : Remote Duration : 6+ months Contract (with possible extension) Job Description: Building the Azure AI Instance: This is the foundational infrastructure work that everything else depends on. The ML Ops Engineer is responsible...
Suggested
Contract work
Remote work
Artech
United States
4 days ago
ML Ops Engineer
...Client Ops Engineer Location: Berkeley Heights, NJ Hire Type: Contract Experience: Min. 10+ Job Description Summary Machine Learning Ops Engineer to build & support scalable, highly available and robust Machine Learning (Client) /Deep Learning (DL) platform...
Suggested
Contract work
Keylent Inc
Berkeley Heights, NJ
10 hours ago
Senior, ML Engineer - ML Ops Framework
$199.2k - $298.8k
...Senior, ML Engineer - ML Ops Framework Remote - US, Ann Arbor, MI At Torc, we have always believed that autonomous vehicle technology will transform how we travel, move freight, and do business. A leader in autonomous driving since 2007, Torc has spent over a decade...
Full time
Immediate start
Remote work
Relocation
TORC Robotics
United States
4 days ago
ML and ML Ops engineer
...ML And ML Ops Engineer We are looking for a talented ML / LLM Engineer with deep expertise in AWS technologies to join our innovative team. In this role, you will design, develop, and deploy machine learning models and large language models (LLMs) that deliver impactful...
E-Solutions
Keyport, NJ
3 days ago
Senior ML Ops Engineer
$95.3k - $158.8k
...Are you a collaborative Machine Learning Ops Engineer looking to work for a mission driven global organization? Are you looking to drive... ...rights and editorial confidentiality. Key Responsibilities ML & LLM Engineering, Search and Recommendation Engines...
Local area
RELX
Atlanta, GA
3 days ago
ML Ops Engineer (Boston, MA)
...Machine Learning Engineer Architect, build, and operate end-to-end ML pipelines for training, validation and deployment on Google Cloud and AWS. Define,... ...related field. 5+ years of experience as a AI/ML Ops, DevOps, Infrastructure Engineer or equivalent. Expert...
Foundation EGI
Boston, MA
4 days ago
ML Ops Engineer
...Job Title: ML Ops Engineer Location: Reading, PA Looking for a pure MLOps Engineer with hands-on experience in Dataiku (Sage Mager is plus). Responsibilities: Design multi-agent architectures: define agent roles (planner, researcher, retriever, executor...
Temporary work
Jobs via Dice
Sinking Spring, PA
10 hours ago
SENIOR ML/OPS Engineer (AWS and Databricks) - SGWS
...only. Must have excellent, clear communication 9+ years of ML/OPS experience Hands on experience doing ML OPS and deploying ML... ..., and deployment. Collaborate with Data Scientists and ML Engineers to productionize models and convert notebooks into reproducible...
ShiftCode Analytics
Addison, TX
2 days ago
Remote Data & ML Ops Engineer for AI Pipelines
A leading tech consulting firm seeks a Data / ML Ops Engineer to enhance data preparation and AI workflows. The role involves ensuring AI pipelines are secure, performant, and ready for demos. Responsibilities include managing data for RAG pipelines, configuring vector...
Remote work
Dexian
McLean, VA
5 days ago
Data Science & ML Ops Engineer
...Job Title: Data Science & ML Ops Engineer Location: Bay Area,C Tax Term (W2, C2C): W2, C2C We are seeking a hybrid Data Science & ML Ops Engineer to drive the full lifecycle of machine learning solutions-from data exploration and model development...
Remote work
Apolis
United States
3 days ago
ML Ops Infrastructure Engineer
...from a research notebook to a production API serving millions of requests is one of the hardest problems in AI. As an ML Ops Infrastructure Engineer at Deepgram, you will own the critical bridge between research and production -- building the pipelines, deployment systems...
Home office
Flexible hours
Deepgram
New York, NY
3 days ago
Machine Learning (ML) Ops Engineer - IS Clinical Research - Full Time 8 Hour Days (Exempt) (Non[...]
$145.6k - $240.24k
Machine Learning (ML) Ops Engineer - IS Clinical Research - Full Time 8 Hour Days (Exempt) (Non-Union) page is loaded Machine Learning (ML) Ops Engineer - IS Clinical Research - Full Time 8 Hour Days (Exempt) (Non-Union) Apply locations Los Angeles, CA - Health Sciences...
Full time
Work experience placement
Local area
University of Southern California
Los Angeles, CA
5 days ago
ML Ops Engineer II: Deploy & Scale AI in Production
$72.6k - $120.6k
Shirley Ryan AbilityLab in Chicago is seeking an Engineer II in Machine Learning Ops. You will play a key role in deploying, monitoring, and optimizing machine learning models while contributing to an innovative healthcare mission. The ideal candidate will have a Bachelor...
Shirley Ryan AbilityLab
Chicago, IL
5 days ago
Senior ML Ops Engineer: Build Scalable AI Systems
...looking for a Senior Machine Learning Operations Engineer based in Austin, TX. This role will focus on designing and maintaining scalable ML systems, mentoring junior engineers, and... ...5+ years in Python and experience with ML ops. Strong communication skills and a...
Casual work
Rival
Austin, TX
3 days ago
ML Ops Engineer II: Production ML Pipelines
$72.6k - $120.6k
Shirley Ryan AbilityLab is looking for a Machine Learning Ops Engineer II in Chicago. This role involves deploying and optimizing machine learning models, guiding dataset collection, and managing infrastructure to enhance model performance. Ideal candidates will have at...
Shirley-Ryan-Ability-La
Chicago, IL
3 days ago
ML Ops Engineer — Agentic AI Lab (Founding Team)
About the Role ML Ops Engineer — Agentic AI Lab (Founding Team) — Location: San Francisco Bay Area — Type: Full-Time — Compensation: Competitive salary + meaningful equity (founding tier) Backed by 8VC, we're building a world-class team to tackle one of the industry’s most...
Full time
Fabrion
San Francisco, CA
2 days ago
ML Ops Model Governance Engineer
$110k - $140k
...ML Ops Model Governance Engineer Job Location – Louisville, KY / Dallas, TX (Onsite) Job Type – Fulltime Salary Range: $110000 to $140000/Annum Must Have Technical/Functional Skills Experience ~3+ years MLOps experience. Model versioning, registry management...
Full time
Diverse Lynx
Louisville, KY
7 days ago
AWS ML Ops Engineer: Build Scalable AI Pipelines
## Job Description# AWS ML Ops Engineer*Apex Systems is a world-class IT services company that serves thousands of clients across the globe. When you join Apex, you become part of a team that values innovation, collaboration, and continuous learning. We offer quality career...
Apex Systems
Houston, TX
5 days ago
Healthcare ML Ops Engineer - AI Deployment
...educational institution in Los Angeles is seeking a Machine Learning Ops Engineer to manage the full lifecycle of machine learning models. The... ...with cloud platforms and strong technical skills in AI/ML development. Competitive salary and supportive work environment...
University of Southern California
Los Angeles, CA
1 day ago
ML Ops Engineer: End-to-End AI Pipelines on Cloud CI/CD
Medium, a Silicon Valley startup, seeks an experienced AI/ML Ops Engineer in Boston. The role involves building and operating end-to-end ML pipelines on Google Cloud and AWS. Candidates should have 5+ years' experience and expert-level skills in Python and TypeScript. This...
Medium
Boston, MA
3 days ago
ML/Ops Engineer — Production AI on AWS
Oxy is seeking a mid-career MLOps / AI Ops Engineer in Houston, Texas. In this role, you will support the deployment, monitoring, and lifecycle management of machine learning and advanced analytics solutions across upstream Oil & Gas operations. The ideal candidate has...
Relha LLC
Houston, TX
3 days ago
Senior GenAI & ML Ops Engineer
...A leading global data analytics company is seeking a Senior Machine Learning Ops Engineer to automate and manage machine learning workflows. This role demands proficiency in MLOps platforms and strong programming skills in Python, Java, or Scala. Responsibilities include...
RELX
Dover, DE
2 days ago
ML Ops / Model Governance Engineer
We are seeking an ML Ops / Model Governance Engineer to manage the end-to-end lifecycle of machine learning models, ensuring they are production-ready, compliant, observable, and governed according to enterprise and regulatory standards. This role plays a critical part...
Compunnel, Inc.
Louisville, KY
3 days ago
ML Ops Engineer — Equity & AI Infra Architect
A pioneering AI company in the San Francisco Bay Area is seeking an ML Ops Engineer to automate model training, deployment, and governance processes. The ideal candidate will have extensive MLOps experience and be proficient in tools like Kubernetes and Terraform. This...
Fabrion
San Francisco, CA
2 days ago
AI/ML Ops Engineer
...AI/ML Ops Engineer We are looking for a skilled AI/ML Ops Engineer to join our team in Pleasanton CA. The ideal candidate will bring a strong foundation in AI/ML Operations along with a working knowledge of data engineering principles and project delivery best practices...
Kasmo Global
Pleasanton, CA
2 days ago
ML/Ops & DevOps Engineer - Telecom Industry
Job Title :- ML/Ops & DevOps Engineer - Telecom Industry Employment Type :- W2 Duration :- Long Term Visa Type :- All Visa applicable which are ready for W2 Location- Bellevue, WA (Day-1 Onsite) Industry :- Telecom Job Description: We are seeking a highly skilled...
Highbrow LLC
Bellevue, WA
3 days ago
Lead ML Ops engineer
## Lead ML Ops engineerApplylocations: Arlington, VA: Nashville, TN: Racine, WI: Connection Center-Tempe,AZ: Austin, TXtime type: Full... ...to hire an experienced **Lead Machine Learning Operations Engineer** to join our talented team. This role manages a team of Machine...
Flexible hours
CliftonLarsonAllen LLP
Arlington, VA
5 days ago

Do you want to receive more vacancies?

Subscribe and receive similar vacancies to Staff ML Ops Engineer. Be the first to apply!