Sign up to access all features of our service.
  • Job search
  • Favorites
  • Create a CV
    New
  • Salaries
  • Subscriptions

Staff ML Ops Engineer

Albert Invent

Albert's mission is to digitalize the world of chemistry. Using data and machine learning, Albert enables R&D organizations to dramatically accelerate the invention of new materials. Our platform helps scientists and engineers build structured data foundations, digitize formulation and testing workflows, and apply AI to innovate faster, smarter, and at scale.

About the role

As our Backend & Infrastructure Engineer, you will architect and build the core systems that power everything our AI/ML team delivers-the APIs, infrastructure, and distributed systems that make intelligent capabilities possible at scale. This is a foundational role: you'll shape how AI gets built and shipped here.


We are seeking a highly motivated and talented individual with deep expertise in Python backend development, Kubernetes, and distributed systems. You'll be embedded with ML engineers and researchers, building robust systems that turn ambitious AI ideas into production realities-whether that's powering agent-based workflows, scaling inference, or enabling scientific computing pipelines. The infrastructure you build will directly enable researchers at the world's largest chemical and materials companies to leverage AI in ways that weren't possible before-accelerating discovery, enabling inverse design of novel materials, and transforming how science gets done.


What you'll do

Infrastructure & Kubernetes:


  • Design, deploy, and maintain Kubernetes infrastructure supporting AI/ML workloads
  • Manage containerized services, autoscaling, networking, and resource optimization
Backend Development:
  • Design and build high-performance Python APIs and services using FastAPI or similar frameworks
  • Architect backend systems for scalability, reliability, and low latency
  • Build integrations between AI/ML systems and the broader Albert platform
Distributed Systems:
  • Build and operate distributed systems that handle compute-intensive and high-throughput workloads
  • Design for fault tolerance, graceful degradation, and horizontal scalability
  • Implement async workflows, job queues, and task orchestration as needed
Data Infrastructure:
  • Architect and maintain data pipelines and storage systems supporting AI/ML workflows
  • Work with vector databases, caches, and other data stores as required by ML systems
  • Ensure efficient data access patterns for training and inference workloads
Reliability & Operations:
  • Implement observability including logging, metrics, tracing, and alerting
  • Own system reliability-troubleshoot issues, conduct post-mortems, and continuously improve
  • Design CI/CD pipelines and promote automation best practices
  • Implement infrastructure-as-code practices using Terraform, Helm, ArgoCd, Pulumi, or similar tools
Collaboration:
  • Partner closely with ML engineers to understand requirements and deliver production-ready infrastructure
  • Translate ML prototypes and research code into scalable, maintainable systems
  • Contribute to technical decisions that shape the team's architecture
You will have
  • Deep expertise in Python backend development and distributed systems
  • Strong Kubernetes and cloud infrastructure experience
  • A builder's mindset-you want to create foundational systems that others build on
  • Genuine interest in science and technology; curiosity about how your work enables scientific discovery
  • A commitment to building systems that are reliable, maintainable, and scalable
Key competencies
  • A degree in Computer Science or a related field with 7+ years of industry experience (Bachelor's) or 5+ years (Master's or PhD) in software engineering
  • Experience supporting AI/ML teams or deploying ML systems in production
  • Experience with GPU workloads and scheduling
  • Advanced proficiency in Python including async programming and performance optimization
  • Deep experience with Kubernetes-cluster management, networking, autoscaling, and troubleshooting
  • Strong background in distributed systems and microservices architecture
  • Experience with cloud platforms (AWS, GCP, or Azure) and infrastructure-as-code
  • Proficiency in REST API development using FastAPI, Flask, or similar
  • Experience with containerization and CI/CD pipelines
  • Track record of operating production systems at scale
Preferred/Bonus Points
  • Familiarity with scientific computing or research environments
  • Background in or curiosity about chemistry, materials science, or related fields
  • Familiarity with data engineering tools (Airflow, Dagster, or similar)
  • Experience with vector databases or search infrastructure
  • Expertise in observability tools (Prometheus, Grafana, Datadog)
  • Experience with message queues and event-driven architectures (Kafka, Redis, RabbitMQ)
  • Contributions to open-source projects
  • Experience mentoring engineers
Why Albert?

We have a huge impact. Albert is a growing team with a big reach. Our Platform facilitates the invention of materials for tens of thousands of companies and hundreds of thousands of applications - from coatings used on rockets to adhesives used in electric vehicles to 3D printed medical devices. We love distributed teams. Albert's home-base is in the California Bay Area, but we have multiple offices and employees sprinkled around the globe. In fact, over 50% of our employees work outside of California! An international remote culture is in our DNA. We care about you. Albert works hard to create a positive environment for our employees, and we think your life outside of work is important too. We work hard and we play hard. We value diversity. Growing and maintaining our inclusive and diverse team matters to us. We are committed to being a company where our employees feel comfortable bringing their authentic selves to work and have the ability to be successful - every day. We're always looking for humble, sharp, and creative folks to join the Albert team. If you think you might be a fit please apply!
Vacancy posted 13 hours ago
Similar jobs that could be interesting for youBased on the Staff ML Ops Engineer in Oakland, CA vacancy
  •  ...ML Ops Engineer — Agentic AI Lab (Founding Team) Location: San Francisco Bay Area Type: Full-Time Compensation: Competitive salary + meaningful equity (founding tier) Backed by 8VC, we're building a world-class team to tackle one of the industry's most critical... 
    Suggested
    Full time

    Fabrion

    San Francisco, CA
    4 days ago
  •  ...ML Ops Engineer ML Ops Engineer to drive the full lifecycle of machine learning solutions—from data exploration and model development to scalable deployment and monitoring. This role bridges the gap between data science model development and production-grade ML Ops... 
    Suggested

    Omni Inclusive

    San Leandro, CA
    1 day ago
  •  ...About the role As our Backend & Infrastructure Engineer, you will architect and build the core systems that power everything our AI/ML team delivers—the APIs, infrastructure, and distributed systems that make intelligent capabilities possible at scale. This is a foundational... 
    Suggested
    Remote work
    Work from home

    Albert Invent Corp

    Oakland, CA
    2 days ago
  • A pioneering AI company in the San Francisco Bay Area is seeking an ML Ops Engineer to automate model training, deployment, and governance processes. The ideal candidate will have extensive MLOps experience and be proficient in tools like Kubernetes and Terraform. This... 
    Suggested

    Fabrion

    San Francisco, CA
    4 days ago
  •  ...MLOps Engineer At Hayden AI, we are on a mission to harness the power of computer vision to transform the way transit systems and other...  ...machine learning models and drive best practices across the ML lifecycle. You will play a key role in shaping the architecture... 
    Suggested
    Work at office
    3 days per week

    Hayden AI

    San Francisco, CA
    13 hours ago
  •  ...technology company based in Berkeley is seeking a Data Pipeline Engineer to design and maintain scalable data pipelines for their logistics...  ...experience, proficiency in GCP/AWS/Azure, and is familiar with ML frameworks like Pytorch. This role requires expertise in setting... 

    Emancro

    Berkeley, CA
    13 hours ago
  •  ...Zensors Infrastructure Engineer Zensors is the spatial intelligence platform for the physical world. Our AI platform provides real-time...  .... Collaboration & Scaling: Collaborate deeply with ML engineers to ensure validation readiness for new models and take... 
    Work at office

    Zensors

    San Francisco, CA
    4 days ago
  •  ...leading hospitality platform in San Francisco is seeking a Staff Machine Learning Engineer to enhance guest and host experiences through cutting-edge...  ..., and contribute to improving product experiences using ML. Ideal candidates should have extensive experience in applied... 

    airbnb, Inc.

    San Francisco, CA
    4 days ago
  • $250k - $325k

     ...raised our Series B and have grown 800% over the last 12 months. Engineering at Ivo Engineers at Ivo are inventors. Ivo was first-to-...  ...(dev → staging → prod) Design strategies to isolate ML vs API workloads while optimizing for cost, performance, and reliability... 
    Contract work
    Work at office
    Remote work

    IVO Inc

    San Francisco, CA
    4 days ago
  • $181.1k - $318.4k

     ...Staff/Sr. Machine Learning Engineer, Foundation Models - AI, Search & Knowledge Platforms Work Locations (2) Submit Resume We are Foundation Model...  ...using CUDA. ~ Familiarity with one of the popular ML Frameworks like Pytorch, Tensorflow. ~ Have experience... 
    Relocation

    Apple

    San Francisco, CA
    1 day ago
  •  ...Qualifications: Expert-level PyTorch. Proven software engineer who loves ML; comfortable writing production code across the stack....  ...user happiness. Deep knowledge of the ML lifecycle: dataset ops, training pipelines, eval frameworks, deployment, and monitoring... 
    Full time
    Contract work
    Flexible hours
    Shift work

    SESAME

    San Francisco, CA
    3 days ago
  •  ...will be a part of a small, production-minded ML team based in Orange County/Oakland. You'll collaborate with other engineers and researchers to develop, evaluate, and help...  ...engineers on APIs, containers, and CI, and with ops/labeling teams on edge cases and feedback... 

    Kinetic Corporation

    Oakland, CA
    3 days ago
  • $181.1k - $318.4k

     ...Senior ML Data Engineer, MLO Do you believe Machine Learning and AI can change the world? We truly believe it can! We are the ML Data Team...  ...lift the level of software engineering excellence in a ML data ops team typically focused on short term deliveries At Apple,... 
    Temporary work
    Relocation

    Apple

    San Francisco, CA
    3 days ago
  • $185k - $235k

     ...mechanism. The real product is a scalable risk engine. We stay when traditional insurers exit...  ...less friction. Role Summary: This ML Engineer role owns tooling surrounding...  ...frameworks Transition execution from manual ops → automated systems 2. Quality... 
    Full time
    Temporary work
    H1b
    Work at office
    Remote work
    Visa sponsorship
    Work visa
    Flexible hours

    Stand Insurance

    San Francisco, CA
    3 days ago
  • The Role: Why, What and the Who Infrastructure Engineers build the foundation for Ivo’s entire platform. Customers are cagey about their contracts...  ...(dev → staging → prod). Design strategies to isolate ML vs API workloads while optimizing for cost, performance, and reliability... 

    Icehouseventures

    San Francisco, CA
    3 days ago
  • $210k - $260k

     ...Staff ML Engineer, Product Rocket Money's mission is to empower people to live their best financial lives. Rocket Money offers members a unique...  ...Projects: Develop comprehensive evaluation and ML Ops frameworks that enable systematic assessment of model performance... 
    Work at office

    Truebill (Acquired by Rocket Money)

    San Francisco, CA
    4 days ago
  • $190k - $260k

     ...visit Role Description As a Senior ML Infrastructure Engineer, you will work directly in the Automation org with the core ML, Ops, and Analytics teams to help improve and...  ...Engineer Base Salary- $190,000-$210,000 Staff ML Engineer Base Salary- $245,000-$260,000... 

    Gridware

    San Francisco, CA
    1 day ago
  • ML Systems Engineer - Robotics & AI We are building the full-stack foundation for the next generation...  ...manufacturing scale-up. We are hiring a Staff/Principal ML Systems Engineer to own...  ...performance work including CUDA or Triton, fused ops, and compiler/graph capture. Experience... 

    Maxwell Bond

    San Francisco, CA
    4 days ago
  • Icehouseventures is seeking an Infrastructure Engineer in San Francisco, CA, to build and maintain the foundational Kubernetes platform across...  ...security controls, and collaborate closely with SRE and ML teams. Ideal candidates will thrive in a startup environment and... 

    Icehouseventures

    San Francisco, CA
    3 days ago
  • $175k - $250k

     ...Senior AI/ML Engineer: Python & Scientific Computing SF, NYC, Remote About Swayable...  ...engineers, data scientists, and research staff to build new features and solve novel problems...  ...constantly evolving toolset for ML and AI Ops. You are knowledgeable about software... 
    Remote work

    Swayable

    San Francisco, CA
    3 days ago
  • $151k - $177.5k

     ...Staff Applied ML Engineer Alameda, CA About Us We are Sila, a next-generation battery materials company. Our mission is to power the world's transition to clean energy. To create this future, our team is building a better lithium-ion battery from the inside out... 

    Sila

    Alameda, CA
    13 hours ago
  • $185k

     ...We're seeking a Research-Hardware Codesign Engineer to operate at the boundary between model...  ...end-to-end. Proactively pull in new ML workloads, prototype them with rooflines and...  ...understanding of the resulting mapping of tensor ops to functional units. Working... 
    Relocation package
    3 days per week

    OpenAI

    San Francisco, CA
    13 hours ago
  •  ...Job Description: We are seeking three knowledgeable Senior Python/ML Operations Engineers with advanced Python and Flask, Large Language Models, Open API engineering, Containerization and Swagger expertise for a multi-year engagement to work with a foremost Healthcare... 

    Texas State Library and Archives Commision

    Oakland, CA
    3 days ago
  •  ...Senior ML Operations Engineer We are seeking a Senior ML Operations Engineer with MEAN/MERN Stack, Search Optimization, Server Side Rendering, Back-end API engineering, advanced Python and Flask expertise for a multi-year engagement to work with a foremost Healthcare... 

    Samprasoft

    Oakland, CA
    5 days ago
  • $118k - $169k

     ...platforms, tools, and processes that take our models from ideas to production models, serving predictions in real time. The Sr. ML Ops Engineer will partner with our Data Science, Data Product Management, Product Engineering, and Data Platform teams to create and support... 
    Hourly pay
    Work experience placement
    Work at office
    Immediate start
    Visa sponsorship
    Work visa
    Flexible hours

    Early Warning Services

    San Francisco, CA
    13 hours ago
  • An innovative AI startup is seeking a Senior Machine Learning Engineer to join a small, senior team dedicated to building AI systems for...  ...consequence environments. This role involves improving production ML systems, optimizing models for latency and cost, and... 

    Rational Dynamics

    Berkeley, CA
    13 hours ago
  •  ...company focused on blockchain solutions is seeking a Senior ML Systems Engineer. In this role, you will build reusable workflows, automate model...  ...with scalable infrastructure, and a deep understanding of ML Ops best practices. The position offers a competitive salary... 

    TRM Labs

    San Francisco, CA
    2 days ago
  • I did my part and supported the Regular Toilet is seeking a skilled ML Performance Engineer to enhance robotic solutions. This role involves owning product performance, analyzing robot behavior, and collaborating closely with various teams to drive improvements in our universal... 

    I did my part and supported the Regular Toilet

    Emeryville, CA
    13 hours ago
  • Staff Machine Learning Engineer, Listings and Host Tools Data and AI Airbnb was born in 2007 when two hosts welcomed three guests to their San Francisco...  ...Listing Marketplace Intelligence Machine Learning (ULM-ML) team: The ULM-ML team supports host personalization... 
    Work experience placement

    airbnb, Inc.

    San Francisco, CA
    4 days ago
  • $68.64k - $72.21k

    Denny's in Emeryville, California, seeks a proactive individual to ensure efficient staffing and operational excellence. Your role involves maintaining compliance with laws and fostering a positive atmosphere for both patrons and employees. This position values strong leadership...

    Denny's

    Emeryville, CA
    4 days ago

Do you want to receive more vacancies?

Subscribe and receive similar vacancies to Staff ML Ops Engineer. Be the first to apply!