Sign up to access all features of our service.
  • Job search
  • Favorites
  • Create a CV
    New
  • Salaries
  • Subscriptions

Staff ML Ops Engineer

Albert Invent

Backend & Infrastructure Engineer

Albert's mission is to digitalize the world of chemistry. Using data and machine learning, Albert enables R&D organizations to dramatically accelerate the invention of new materials. Our platform helps scientists and engineers build structured data foundations, digitize formulation and testing workflows, and apply AI to innovate faster, smarter, and at scale.

About the Role

As our Backend & Infrastructure Engineer, you will architect and build the core systems that power everything our AI/ML team delivers—the APIs, infrastructure, and distributed systems that make intelligent capabilities possible at scale. This is a foundational role: you'll shape how AI gets built and shipped here.

We are seeking a highly motivated and talented individual with deep expertise in Python backend development, Kubernetes, and distributed systems. You'll be embedded with ML engineers and researchers, building robust systems that turn ambitious AI ideas into production realities—whether that's powering agent-based workflows, scaling inference, or enabling scientific computing pipelines. The infrastructure you build will directly enable researchers at the world's largest chemical and materials companies to leverage AI in ways that weren't possible before—accelerating discovery, enabling inverse design of novel materials, and transforming how science gets done.

What You'll Do

Infrastructure & Kubernetes:

  • Design, deploy, and maintain Kubernetes infrastructure supporting AI/ML workloads
  • Manage containerized services, autoscaling, networking, and resource optimization

Backend Development:

  • Design and build high-performance Python APIs and services using FastAPI or similar frameworks
  • Architect backend systems for scalability, reliability, and low latency
  • Build integrations between AI/ML systems and the broader Albert platform

Distributed Systems:

  • Build and operate distributed systems that handle compute-intensive and high-throughput workloads
  • Design for fault tolerance, graceful degradation, and horizontal scalability
  • Implement async workflows, job queues, and task orchestration as needed

Data Infrastructure:

  • Architect and maintain data pipelines and storage systems supporting AI/ML workflows
  • Work with vector databases, caches, and other data stores as required by ML systems
  • Ensure efficient data access patterns for training and inference workloads

Reliability & Operations:

  • Implement observability including logging, metrics, tracing, and alerting
  • Own system reliability—troubleshoot issues, conduct post-mortems, and continuously improve
  • Design CI/CD pipelines and promote automation best practices
  • Implement infrastructure-as-code practices using Terraform, Helm, ArgoCd, Pulumi, or similar tools

Collaboration:

  • Partner closely with ML engineers to understand requirements and deliver production-ready infrastructure
  • Translate ML prototypes and research code into scalable, maintainable systems
  • Contribute to technical decisions that shape the team's architecture
You Will Have
  • Deep expertise in Python backend development and distributed systems
  • Strong Kubernetes and cloud infrastructure experience
  • A builder's mindset—you want to create foundational systems that others build on
  • Genuine interest in science and technology; curiosity about how your work enables scientific discovery
  • A commitment to building systems that are reliable, maintainable, and scalable
Key Competencies
  • A degree in Computer Science or a related field with 7+ years of industry experience (Bachelor's) or 5+ years (Master's or PhD) in software engineering
  • Experience supporting AI/ML teams or deploying ML systems in production
  • Experience with GPU workloads and scheduling
  • Advanced proficiency in Python including async programming and performance optimization
  • Deep experience with Kubernetes—cluster management, networking, autoscaling, and troubleshooting
  • Strong background in distributed systems and microservices architecture
  • Experience with cloud platforms (AWS, GCP, or Azure) and infrastructure-as-code
  • Proficiency in REST API development using FastAPI, Flask, or similar
  • Experience with containerization and CI/CD pipelines
  • Track record of operating production systems at scale
Preferred/Bonus Points
  • Familiarity with scientific computing or research environments
  • Background in or curiosity about chemistry, materials science, or related fields
  • Familiarity with data engineering tools (Airflow, Dagster, or similar)
  • Experience with vector databases or search infrastructure
  • Expertise in observability tools (Prometheus, Grafana, Datadog)
  • Experience with message queues and event-driven architectures (Kafka, Redis, RabbitMQ)
  • Contributions to open-source projects
  • Experience mentoring engineers
Why Albert?

We have a huge impact. Albert is a growing team with a big reach. Our Platform facilitates the invention of materials for tens of thousands of companies and hundreds of thousands of applications - from coatings used on rockets to adhesives used in electric vehicles to 3D printed medical devices. We love distributed teams. Albert's home-base is in the California Bay Area, but we have multiple offices and employees sprinkled around the globe. In fact, over 50% of our employees work outside of California! An international remote culture is in our DNA. We care about you. Albert works hard to create a positive environment for our employees, and we think your life outside of work is important too. We work hard and we play hard. We value diversity. Growing and maintaining our inclusive and diverse team matters to us. We are committed to being a company where our employees feel comfortable bringing their authentic selves to work and have the ability to be successful -- every day. We're always looking for humble, sharp, and creative folks to join the Albert team. If you think you might be a fit please apply!

Vacancy posted 10 hours ago
Similar jobs that could be interesting for youBased on the Staff ML Ops Engineer in United States vacancy
  • $190k - $210k

     ...Senior ML Ops Engineer | High-Growth AI Startup | $190,000 - $210,000 | Boston (Hybrid) A venture-backed AI startup is expanding its engineering team and hiring a Senior MLOps Engineer to build and scale production-grade AI systems for healthcare automation. This... 
    Suggested
    Local area
    Flexible hours

    Franklin Fitch

    Boston, MA
    10 hours ago
  • $110.08k - $185.14k

     ...Machine Learning Operations Engineer LMI is seeking a Machine Learning Operations Engineer...  ...support the development of cutting-edge AI/ML solutions in collaboration with the Army's...  ...needs for the United States Army. The ML Ops Engineer will work at the intersection of... 
    Suggested
    Contract work
    Remote work

    LMI

    United States
    1 day ago
  •  ...Senior ML Ops Engineer Intuition Machines uses AI/ML to build enterprise security products. We apply our research to systems that serve hundreds of millions of people, with a team distributed around the world. You are probably familiar with our best-known product, the... 
    Suggested
    Remote work
    Flexible hours

    Intuition Machines, Inc.

    United States
    1 day ago
  •  ...Job Title : ML Ops Engineer Location : Remote Duration : 6+ months Contract (with possible extension) Job Description: Building the Azure AI Instance: This is the foundational infrastructure work that everything else depends on. The ML Ops Engineer is responsible... 
    Suggested
    Contract work
    Remote work

    Artech

    United States
    4 days ago
  •  ...Client Ops Engineer Location: Berkeley Heights, NJ Hire Type: Contract Experience: Min. 10+ Job Description Summary Machine Learning Ops Engineer to build & support scalable, highly available and robust Machine Learning (Client) /Deep Learning (DL) platform... 
    Suggested
    Contract work

    Keylent Inc

    Berkeley Heights, NJ
    10 hours ago
  • $199.2k - $298.8k

     ...Senior, ML Engineer - ML Ops Framework Remote - US, Ann Arbor, MI At Torc, we have always believed that autonomous vehicle technology will transform how we travel, move freight, and do business. A leader in autonomous driving since 2007, Torc has spent over a decade... 
    Full time
    Immediate start
    Remote work
    Relocation

    TORC Robotics

    United States
    4 days ago
  •  ...ML And ML Ops Engineer We are looking for a talented ML / LLM Engineer with deep expertise in AWS technologies to join our innovative team. In this role, you will design, develop, and deploy machine learning models and large language models (LLMs) that deliver impactful... 

    E-Solutions

    Keyport, NJ
    3 days ago
  • $95.3k - $158.8k

     ...Are you a collaborative Machine Learning Ops Engineer looking to work for a mission driven global organization? Are you looking to drive...  ...rights and editorial confidentiality. Key Responsibilities ML & LLM Engineering, Search and Recommendation Engines... 
    Local area

    RELX

    Atlanta, GA
    3 days ago
  •  ...Machine Learning Engineer Architect, build, and operate end-to-end ML pipelines for training, validation and deployment on Google Cloud and AWS. Define,...  ...related field. 5+ years of experience as a AI/ML Ops, DevOps, Infrastructure Engineer or equivalent. Expert... 

    Foundation EGI

    Boston, MA
    4 days ago
  •  ...Job Title: ML Ops Engineer Location: Reading, PA Looking for a pure MLOps Engineer with hands-on experience in Dataiku (Sage Mager is plus). Responsibilities: Design multi-agent architectures: define agent roles (planner, researcher, retriever, executor... 
    Temporary work

    Jobs via Dice

    Sinking Spring, PA
    10 hours ago
  •  ...only. Must have excellent, clear communication 9+ years of ML/OPS experience Hands on experience doing ML OPS and deploying ML...  ..., and deployment. Collaborate with Data Scientists and ML Engineers to productionize models and convert notebooks into reproducible... 

    ShiftCode Analytics

    Addison, TX
    2 days ago
  • A leading tech consulting firm seeks a Data / ML Ops Engineer to enhance data preparation and AI workflows. The role involves ensuring AI pipelines are secure, performant, and ready for demos. Responsibilities include managing data for RAG pipelines, configuring vector... 
    Remote work

    Dexian

    McLean, VA
    5 days ago
  •  ...Job Title: Data Science & ML Ops Engineer Location: Bay Area,C Tax Term (W2, C2C): W2, C2C We are seeking a hybrid Data Science & ML Ops Engineer to drive the full lifecycle of machine learning solutions-from data exploration and model development... 
    Remote work

    Apolis

    United States
    3 days ago
  •  ...from a research notebook to a production API serving millions of requests is one of the hardest problems in AI. As an ML Ops Infrastructure Engineer at Deepgram, you will own the critical bridge between research and production -- building the pipelines, deployment systems... 
    Home office
    Flexible hours

    Deepgram

    New York, NY
    3 days ago
  • $145.6k - $240.24k

    Machine Learning (ML) Ops Engineer - IS Clinical Research - Full Time 8 Hour Days (Exempt) (Non-Union) page is loaded Machine Learning (ML) Ops Engineer - IS Clinical Research - Full Time 8 Hour Days (Exempt) (Non-Union) Apply locations Los Angeles, CA - Health Sciences... 
    Full time
    Work experience placement
    Local area

    University of Southern California

    Los Angeles, CA
    5 days ago
  • $72.6k - $120.6k

    Shirley Ryan AbilityLab in Chicago is seeking an Engineer II in Machine Learning Ops. You will play a key role in deploying, monitoring, and optimizing machine learning models while contributing to an innovative healthcare mission. The ideal candidate will have a Bachelor... 

    Shirley Ryan AbilityLab

    Chicago, IL
    5 days ago
  •  ...looking for a Senior Machine Learning Operations Engineer based in Austin, TX. This role will focus on designing and maintaining scalable ML systems, mentoring junior engineers, and...  ...5+ years in Python and experience with ML ops. Strong communication skills and a... 
    Casual work

    Rival

    Austin, TX
    3 days ago
  • $72.6k - $120.6k

    Shirley Ryan AbilityLab is looking for a Machine Learning Ops Engineer II in Chicago. This role involves deploying and optimizing machine learning models, guiding dataset collection, and managing infrastructure to enhance model performance. Ideal candidates will have at... 

    Shirley-Ryan-Ability-La

    Chicago, IL
    3 days ago
  • About the Role ML Ops Engineer — Agentic AI Lab (Founding Team) — Location: San Francisco Bay Area — Type: Full-Time — Compensation: Competitive salary + meaningful equity (founding tier) Backed by 8VC, we're building a world-class team to tackle one of the industry’s most... 
    Full time

    Fabrion

    San Francisco, CA
    2 days ago
  • $110k - $140k

     ...ML Ops Model Governance Engineer Job Location – Louisville, KY / Dallas, TX (Onsite) Job Type – Fulltime Salary Range: $110000 to $140000/Annum Must Have Technical/Functional Skills Experience ~3+ years MLOps experience. Model versioning, registry management... 
    Full time

    Diverse Lynx

    Louisville, KY
    7 days ago
  • ## Job Description# AWS ML Ops Engineer*Apex Systems is a world-class IT services company that serves thousands of clients across the globe. When you join Apex, you become part of a team that values innovation, collaboration, and continuous learning. We offer quality career... 

    Apex Systems

    Houston, TX
    5 days ago
  •  ...educational institution in Los Angeles is seeking a Machine Learning Ops Engineer to manage the full lifecycle of machine learning models. The...  ...with cloud platforms and strong technical skills in AI/ML development. Competitive salary and supportive work environment... 

    University of Southern California

    Los Angeles, CA
    1 day ago
  • Medium, a Silicon Valley startup, seeks an experienced AI/ML Ops Engineer in Boston. The role involves building and operating end-to-end ML pipelines on Google Cloud and AWS. Candidates should have 5+ years' experience and expert-level skills in Python and TypeScript. This... 

    Medium

    Boston, MA
    3 days ago
  • Oxy is seeking a mid-career MLOps / AI Ops Engineer in Houston, Texas. In this role, you will support the deployment, monitoring, and lifecycle management of machine learning and advanced analytics solutions across upstream Oil & Gas operations. The ideal candidate has... 

    Relha LLC

    Houston, TX
    3 days ago
  •  ...A leading global data analytics company is seeking a Senior Machine Learning Ops Engineer to automate and manage machine learning workflows. This role demands proficiency in MLOps platforms and strong programming skills in Python, Java, or Scala. Responsibilities include... 

    RELX

    Dover, DE
    2 days ago
  • We are seeking an ML Ops / Model Governance Engineer to manage the end-to-end lifecycle of machine learning models, ensuring they are production-ready, compliant, observable, and governed according to enterprise and regulatory standards. This role plays a critical part... 

    Compunnel, Inc.

    Louisville, KY
    3 days ago
  • A pioneering AI company in the San Francisco Bay Area is seeking an ML Ops Engineer to automate model training, deployment, and governance processes. The ideal candidate will have extensive MLOps experience and be proficient in tools like Kubernetes and Terraform. This... 

    Fabrion

    San Francisco, CA
    2 days ago
  •  ...AI/ML Ops Engineer We are looking for a skilled AI/ML Ops Engineer to join our team in Pleasanton CA. The ideal candidate will bring a strong foundation in AI/ML Operations along with a working knowledge of data engineering principles and project delivery best practices... 

    Kasmo Global

    Pleasanton, CA
    2 days ago
  • Job Title :- ML/Ops & DevOps Engineer - Telecom Industry Employment Type :- W2 Duration :- Long Term Visa Type :- All Visa applicable which are ready for W2 Location- Bellevue, WA (Day-1 Onsite) Industry :- Telecom Job Description: We are seeking a highly skilled... 

    Highbrow LLC

    Bellevue, WA
    3 days ago
  • ## Lead ML Ops engineerApplylocations: Arlington, VA: Nashville, TN: Racine, WI: Connection Center-Tempe,AZ: Austin, TXtime type: Full...  ...to hire an experienced **Lead Machine Learning Operations Engineer** to join our talented team. This role manages a team of Machine... 
    Flexible hours

    CliftonLarsonAllen LLP

    Arlington, VA
    5 days ago

Do you want to receive more vacancies?

Subscribe and receive similar vacancies to Staff ML Ops Engineer. Be the first to apply!