Staff ML Ops Engineer
Albert Invent
Backend & Infrastructure Engineer
Albert's mission is to digitalize the world of chemistry. Using data and machine learning, Albert enables R&D organizations to dramatically accelerate the invention of new materials. Our platform helps scientists and engineers build structured data foundations, digitize formulation and testing workflows, and apply AI to innovate faster, smarter, and at scale.
About the Role
As our Backend & Infrastructure Engineer, you will architect and build the core systems that power everything our AI/ML team delivers—the APIs, infrastructure, and distributed systems that make intelligent capabilities possible at scale. This is a foundational role: you'll shape how AI gets built and shipped here.
We are seeking a highly motivated and talented individual with deep expertise in Python backend development, Kubernetes, and distributed systems. You'll be embedded with ML engineers and researchers, building robust systems that turn ambitious AI ideas into production realities—whether that's powering agent-based workflows, scaling inference, or enabling scientific computing pipelines. The infrastructure you build will directly enable researchers at the world's largest chemical and materials companies to leverage AI in ways that weren't possible before—accelerating discovery, enabling inverse design of novel materials, and transforming how science gets done.
What You'll Do
Infrastructure & Kubernetes:
- Design, deploy, and maintain Kubernetes infrastructure supporting AI/ML workloads
- Manage containerized services, autoscaling, networking, and resource optimization
Backend Development:
- Design and build high-performance Python APIs and services using FastAPI or similar frameworks
- Architect backend systems for scalability, reliability, and low latency
- Build integrations between AI/ML systems and the broader Albert platform
Distributed Systems:
- Build and operate distributed systems that handle compute-intensive and high-throughput workloads
- Design for fault tolerance, graceful degradation, and horizontal scalability
- Implement async workflows, job queues, and task orchestration as needed
Data Infrastructure:
- Architect and maintain data pipelines and storage systems supporting AI/ML workflows
- Work with vector databases, caches, and other data stores as required by ML systems
- Ensure efficient data access patterns for training and inference workloads
Reliability & Operations:
- Implement observability including logging, metrics, tracing, and alerting
- Own system reliability—troubleshoot issues, conduct post-mortems, and continuously improve
- Design CI/CD pipelines and promote automation best practices
- Implement infrastructure-as-code practices using Terraform, Helm, ArgoCd, Pulumi, or similar tools
Collaboration:
- Partner closely with ML engineers to understand requirements and deliver production-ready infrastructure
- Translate ML prototypes and research code into scalable, maintainable systems
- Contribute to technical decisions that shape the team's architecture
You Will Have
- Deep expertise in Python backend development and distributed systems
- Strong Kubernetes and cloud infrastructure experience
- A builder's mindset—you want to create foundational systems that others build on
- Genuine interest in science and technology; curiosity about how your work enables scientific discovery
- A commitment to building systems that are reliable, maintainable, and scalable
Key Competencies
- A degree in Computer Science or a related field with 7+ years of industry experience (Bachelor's) or 5+ years (Master's or PhD) in software engineering
- Experience supporting AI/ML teams or deploying ML systems in production
- Experience with GPU workloads and scheduling
- Advanced proficiency in Python including async programming and performance optimization
- Deep experience with Kubernetes—cluster management, networking, autoscaling, and troubleshooting
- Strong background in distributed systems and microservices architecture
- Experience with cloud platforms (AWS, GCP, or Azure) and infrastructure-as-code
- Proficiency in REST API development using FastAPI, Flask, or similar
- Experience with containerization and CI/CD pipelines
- Track record of operating production systems at scale
Preferred/Bonus Points
- Familiarity with scientific computing or research environments
- Background in or curiosity about chemistry, materials science, or related fields
- Familiarity with data engineering tools (Airflow, Dagster, or similar)
- Experience with vector databases or search infrastructure
- Expertise in observability tools (Prometheus, Grafana, Datadog)
- Experience with message queues and event-driven architectures (Kafka, Redis, RabbitMQ)
- Contributions to open-source projects
- Experience mentoring engineers
Why Albert?
We have a huge impact. Albert is a growing team with a big reach. Our Platform facilitates the invention of materials for tens of thousands of companies and hundreds of thousands of applications - from coatings used on rockets to adhesives used in electric vehicles to 3D printed medical devices. We love distributed teams. Albert's home-base is in the California Bay Area, but we have multiple offices and employees sprinkled around the globe. In fact, over 50% of our employees work outside of California! An international remote culture is in our DNA. We care about you. Albert works hard to create a positive environment for our employees, and we think your life outside of work is important too. We work hard and we play hard. We value diversity. Growing and maintaining our inclusive and diverse team matters to us. We are committed to being a company where our employees feel comfortable bringing their authentic selves to work and have the ability to be successful -- every day. We're always looking for humble, sharp, and creative folks to join the Albert team. If you think you might be a fit please apply!
$190k - $210k
...Senior ML Ops Engineer | High-Growth AI Startup | $190,000 - $210,000 | Boston (Hybrid) A venture-backed AI startup is expanding its engineering team and hiring a Senior MLOps Engineer to build and scale production-grade AI systems for healthcare automation. This...SuggestedLocal areaFlexible hours$110.08k - $185.14k
...Machine Learning Operations Engineer LMI is seeking a Machine Learning Operations Engineer... ...support the development of cutting-edge AI/ML solutions in collaboration with the Army's... ...needs for the United States Army. The ML Ops Engineer will work at the intersection of...SuggestedContract workRemote work- ...Senior ML Ops Engineer Intuition Machines uses AI/ML to build enterprise security products. We apply our research to systems that serve hundreds of millions of people, with a team distributed around the world. You are probably familiar with our best-known product, the...SuggestedRemote workFlexible hours
- ...Job Title : ML Ops Engineer Location : Remote Duration : 6+ months Contract (with possible extension) Job Description: Building the Azure AI Instance: This is the foundational infrastructure work that everything else depends on. The ML Ops Engineer is responsible...SuggestedContract workRemote work
- ...Client Ops Engineer Location: Berkeley Heights, NJ Hire Type: Contract Experience: Min. 10+ Job Description Summary Machine Learning Ops Engineer to build & support scalable, highly available and robust Machine Learning (Client) /Deep Learning (DL) platform...SuggestedContract work
$199.2k - $298.8k
...Senior, ML Engineer - ML Ops Framework Remote - US, Ann Arbor, MI At Torc, we have always believed that autonomous vehicle technology will transform how we travel, move freight, and do business. A leader in autonomous driving since 2007, Torc has spent over a decade...Full timeImmediate startRemote workRelocation- ...ML And ML Ops Engineer We are looking for a talented ML / LLM Engineer with deep expertise in AWS technologies to join our innovative team. In this role, you will design, develop, and deploy machine learning models and large language models (LLMs) that deliver impactful...
$95.3k - $158.8k
...Are you a collaborative Machine Learning Ops Engineer looking to work for a mission driven global organization? Are you looking to drive... ...rights and editorial confidentiality. Key Responsibilities ML & LLM Engineering, Search and Recommendation Engines...Local area- ...Machine Learning Engineer Architect, build, and operate end-to-end ML pipelines for training, validation and deployment on Google Cloud and AWS. Define,... ...related field. 5+ years of experience as a AI/ML Ops, DevOps, Infrastructure Engineer or equivalent. Expert...
- ...Job Title: ML Ops Engineer Location: Reading, PA Looking for a pure MLOps Engineer with hands-on experience in Dataiku (Sage Mager is plus). Responsibilities: Design multi-agent architectures: define agent roles (planner, researcher, retriever, executor...Temporary work
- ...only. Must have excellent, clear communication 9+ years of ML/OPS experience Hands on experience doing ML OPS and deploying ML... ..., and deployment. Collaborate with Data Scientists and ML Engineers to productionize models and convert notebooks into reproducible...
- A leading tech consulting firm seeks a Data / ML Ops Engineer to enhance data preparation and AI workflows. The role involves ensuring AI pipelines are secure, performant, and ready for demos. Responsibilities include managing data for RAG pipelines, configuring vector...Remote work
- ...Job Title: Data Science & ML Ops Engineer Location: Bay Area,C Tax Term (W2, C2C): W2, C2C We are seeking a hybrid Data Science & ML Ops Engineer to drive the full lifecycle of machine learning solutions-from data exploration and model development...Remote work
- ...from a research notebook to a production API serving millions of requests is one of the hardest problems in AI. As an ML Ops Infrastructure Engineer at Deepgram, you will own the critical bridge between research and production -- building the pipelines, deployment systems...Home officeFlexible hours
$145.6k - $240.24k
Machine Learning (ML) Ops Engineer - IS Clinical Research - Full Time 8 Hour Days (Exempt) (Non-Union) page is loaded Machine Learning (ML) Ops Engineer - IS Clinical Research - Full Time 8 Hour Days (Exempt) (Non-Union) Apply locations Los Angeles, CA - Health Sciences...Full timeWork experience placementLocal area$72.6k - $120.6k
Shirley Ryan AbilityLab in Chicago is seeking an Engineer II in Machine Learning Ops. You will play a key role in deploying, monitoring, and optimizing machine learning models while contributing to an innovative healthcare mission. The ideal candidate will have a Bachelor...- ...looking for a Senior Machine Learning Operations Engineer based in Austin, TX. This role will focus on designing and maintaining scalable ML systems, mentoring junior engineers, and... ...5+ years in Python and experience with ML ops. Strong communication skills and a...Casual work
$72.6k - $120.6k
Shirley Ryan AbilityLab is looking for a Machine Learning Ops Engineer II in Chicago. This role involves deploying and optimizing machine learning models, guiding dataset collection, and managing infrastructure to enhance model performance. Ideal candidates will have at...- About the Role ML Ops Engineer — Agentic AI Lab (Founding Team) — Location: San Francisco Bay Area — Type: Full-Time — Compensation: Competitive salary + meaningful equity (founding tier) Backed by 8VC, we're building a world-class team to tackle one of the industry’s most...Full time
$110k - $140k
...ML Ops Model Governance Engineer Job Location – Louisville, KY / Dallas, TX (Onsite) Job Type – Fulltime Salary Range: $110000 to $140000/Annum Must Have Technical/Functional Skills Experience ~3+ years MLOps experience. Model versioning, registry management...Full time- ## Job Description# AWS ML Ops Engineer*Apex Systems is a world-class IT services company that serves thousands of clients across the globe. When you join Apex, you become part of a team that values innovation, collaboration, and continuous learning. We offer quality career...
- ...educational institution in Los Angeles is seeking a Machine Learning Ops Engineer to manage the full lifecycle of machine learning models. The... ...with cloud platforms and strong technical skills in AI/ML development. Competitive salary and supportive work environment...
- Medium, a Silicon Valley startup, seeks an experienced AI/ML Ops Engineer in Boston. The role involves building and operating end-to-end ML pipelines on Google Cloud and AWS. Candidates should have 5+ years' experience and expert-level skills in Python and TypeScript. This...
- Oxy is seeking a mid-career MLOps / AI Ops Engineer in Houston, Texas. In this role, you will support the deployment, monitoring, and lifecycle management of machine learning and advanced analytics solutions across upstream Oil & Gas operations. The ideal candidate has...
- ...A leading global data analytics company is seeking a Senior Machine Learning Ops Engineer to automate and manage machine learning workflows. This role demands proficiency in MLOps platforms and strong programming skills in Python, Java, or Scala. Responsibilities include...
- We are seeking an ML Ops / Model Governance Engineer to manage the end-to-end lifecycle of machine learning models, ensuring they are production-ready, compliant, observable, and governed according to enterprise and regulatory standards. This role plays a critical part...
- A pioneering AI company in the San Francisco Bay Area is seeking an ML Ops Engineer to automate model training, deployment, and governance processes. The ideal candidate will have extensive MLOps experience and be proficient in tools like Kubernetes and Terraform. This...
- ...AI/ML Ops Engineer We are looking for a skilled AI/ML Ops Engineer to join our team in Pleasanton CA. The ideal candidate will bring a strong foundation in AI/ML Operations along with a working knowledge of data engineering principles and project delivery best practices...
- Job Title :- ML/Ops & DevOps Engineer - Telecom Industry Employment Type :- W2 Duration :- Long Term Visa Type :- All Visa applicable which are ready for W2 Location- Bellevue, WA (Day-1 Onsite) Industry :- Telecom Job Description: We are seeking a highly skilled...
- ## Lead ML Ops engineerApplylocations: Arlington, VA: Nashville, TN: Racine, WI: Connection Center-Tempe,AZ: Austin, TXtime type: Full... ...to hire an experienced **Lead Machine Learning Operations Engineer** to join our talented team. This role manages a team of Machine...Flexible hours
Do you want to receive more vacancies?
Subscribe and receive similar vacancies to Staff ML Ops Engineer. Be the first to apply!
- assistant engineering manager United States
- information technology administrative assistant United States
- assistant mechanical engineer United States
- staff data engineer United States
- assistant building engineer United States
- staff design engineer United States
- engineering aide United States
- software engineer staff United States
- assistant electrical engineer United States
- assistant field engineer United States


