Sign up to access all features of our service.
  • Job search
  • Favorites
  • Create a CV
    New
  • Salaries
  • Subscriptions

Senior Site Reliability Engineer — AI Cloud Reliability

Dormont Manufacturing Co

Crusoe is seeking a Site Reliability Engineer in San Francisco to ensure the stability and performance of its GPU cloud platform. Successful candidates will have a minimum of 5 years in cloud operations and strong knowledge of tools like Prometheus and Grafana. This role involves collaboration across teams to improve metrics and handle incident response. Benefits include industry competitive pay, health insurance, and a 401(k) match. #J-18808-Ljbffr Dormont Manufacturing Co

Vacancy posted 1 day ago
Similar jobs that could be interesting for youBased on the Senior Site Reliability Engineer — AI Cloud Reliability in San Francisco, CA vacancy
  • Anyscale is seeking a Senior Site Reliability Engineer to join our Infrastructure team in San Francisco, California...  ...candidate will enhance distributed AI application development and work on...  ...with strong experience in Kubernetes and cloud-native technologies. This role focuses... 
    Senior

    Anyscale

    San Francisco, CA
    2 days ago
  •  ...stacks to accelerate the progress of AI applications out into the real world....  ...About the role Anyscale is looking for a Senior Site Reliability Engineer to join the Infrastructure team....  ...running distributed AI applications in the cloud as easy as on your laptop. As part of... 
    Senior

    Anyscale

    San Francisco, CA
    3 days ago
  •  ...security, delivering an AI-powered platform that governs...  ...complex, distributed, cloud‑native systems. As a Staff Platform Engineer, you will play a...  ...leadership role. You will own reliability for major platform...  ...Platform Engineering, or Site Reliability Engineering... 
    Senior

    Saviynt

    San Francisco, CA
    19 days ago
  •  ...About the job Senior Site Reliability Engineer About the Company Stellar is a decentralized, public blockchain...  ...~5+ years of experience of working in cloud-based systems operations, as a SRE or...  ...code Experience experimenting with AI-driven approaches to operations... 
    Senior

    TechChain Talent

    San Francisco, CA
    4 days ago
  •  ...Hyperbolic Labs is on a mission to democratize AI by breaking down the barriers to computing power with our Open-Access AI Cloud. By aggregating computing resources...  ...About the Role We're seeking a Site Reliability Engineer to ensure Hyperbolic's GPU marketplace and... 
    Senior

    Hyperbolic Labs

    San Francisco, CA
    3 days ago
  • $160k - $250k

     ...maintain a hybrid infrastructure with public clouds when the right fit. As we continue to...  ..., we also need to grow our DevOps and Site Reliability team to maintain the reliability of our...  ...passionate about creating a revolutionary AI company. At Hive, you will have a steep... 
    Senior

    Hive

    San Francisco, CA
    3 days ago
  •  ...create the next generation of Gen AI-driven code reviewers: a...  ...significantly outperforms individual engineers. We combine language models...  ...are seeking an experienced Site Reliability Engineer to join our Platform...  ...infrastructure on Google Cloud Platform to support CodeRabbit... 
    Senior

    CodeRabbit

    San Francisco, CA
    16 hours ago
  • OutSystems, Inc. is looking for a Site Reliability Engineer to join their team in San Francisco, CA. The ideal candidate will lead the onboarding of services and teams to reliability tenets while establishing SLOs and SLAs. Proficiency in Python and experience with Kubernetes... 
    Senior
    Flexible hours

    OutSystems, Inc.

    San Francisco, CA
    4 days ago
  • $300k

    Join a stealth-mode startup building out their AI and cloud platform, powered by thousands of H100s, H200s, and B200s, ready...  ...full-scale model training, or inference. As a Platform Engineer/Senior Site Reliability Engineer, you’ll own the reliability, performance, and automation... 
    Senior

    Hamilton Barnes Associates Limited

    San Francisco, CA
    1 day ago
  •  ...services and teams to the reliability tenets. Establish and maintain...  ...infrastructure, ensuring cloud‑native best practices. Collaborate...  ...in Python, using Gen AI tooling to accelerate...  ...6+ years of experience in Site Reliability Engineering, managing infrastructure and... 
    Senior

    OutSystems, Inc.

    San Francisco, CA
    4 days ago
  •  ...human. Heidi is building an AI Care Partner that works alongside...  .... We’re a team of doctors, engineers, designers, researchers, and...  ...to-end. Improve operational reliability: Identify recurring issues...  ...improve Kubernetes clusters, cloud infrastructure, and core platform... 
    Senior
    Work at office
    Worldwide

    Heidi

    San Francisco, CA
    2 days ago
  •  ...acquisition, and Connor was a machine learning research engineer at Scale AI. The rest of our team comes from companies like...  ...go-to-market with state-of-the-art AI. As a Senior SRE, you'll tackle the scaling and reliability challenges that come with adding terabytes of... 
    Senior

    Unify

    San Francisco, CA
    4 days ago
  • $60 per hour

    Senior Site Reliability Engineer (Copy) Seattle Hybrid (Hybrid location). Full-time. About Us Supio is a trusted AI platform purpose-built for law firms, reshaping how data drives impactful outcomes. Our innovative approach blends technology with deep legal expertise,... 
    Senior
    Full time
    Work at office
    Flexible hours

    Bonfirevc

    San Francisco, CA
    4 days ago
  • $166.9k - $225.9k

     ...operates as both a central engineering function and an embedded reliability practice. You'll be part...  ...'ll work across a modern cloud‑native stack to help...  ...+ years of experience in Site Reliability Engineering,...  ...Experience with AIOps—using AI/ML‑based tooling for anomaly... 
    Senior
    Flexible hours

    Drata

    San Francisco, CA
    5 days ago
  • # Senior Site Reliability EngineerHybrid - San Francisco**Our Mission & Values:*...  ...operates as both a central engineering function and an embedded reliability...  ...You'll work across a modern cloud-native stack to help Drata...  ...with AIOps - using AI/ML-based tooling for anomaly... 
    Senior
    Work at office
    Immediate start
    Worldwide
    Monday to Friday
    Flexible hours

    Careers at Drata

    San Francisco, CA
    5 days ago
  • $175k - $250k

     ...250,000.00/yr Job Title: Senior Cloud Infrastructure Engineer Location: San Francisco, CA...  ...unavailable. Modality: On-Site only. Must live within...  ...interact with generative AI. They are the team behind...  ...scalability, performance, and reliability across environments. What... 
    Senior
    Full time
    Remote work
    Relocation
    Relocation package

    The Recruiting Guy

    San Francisco, CA
    5 days ago
  •  ...information, please read ourSenior Site Reliability Engineer page is loaded## Senior Site Reliability Engineerlocations:...  ...secure infrastructure, while ensuring cloud-native best practices;Collaborate...  ...in Python supported by Gen AI tooling to accelerate development of... 
    Senior
    Immediate start
    Remote work
    Worldwide

    OutSystems Inc.

    San Francisco, CA
    4 days ago
  •  ...the data and action layer for AI agents. We give agents fast,...  ...now includes AI agents. Engineering Hiring Sprint We're growing...  ...Engineers Database Engineers Site Reliability Engineers Extensibility API...  ...across multiple regions and clouds. You’ll build and maintain the... 
    Senior
    Work at office
    Local area
    Flexible hours

    Airbyte

    San Francisco, CA
    1 day ago
  • $127k - $249k

    The Team Platform Engineering is the department within SRE that is responsible...  .... Among these are our multi-cloud-provider Kubernetes...  ...components that ensure cluster reliability and security (e.g., CoreDNS,...  ...redefined the database for the AI era, enabling innovators to... 
    Senior
    Work at office
    Local area
    Remote work
    Worldwide
    Flexible hours

    MongoDB

    San Francisco, CA
    5 days ago
  • $300 per month

     ...We’re crafting the engine that powers a world...  ...create ambitiously with AI — without...  ...responsible, transformative cloud infrastructure....  ...building the most reliable, energy-efficient,...  ...that mission. As a Site Reliability Engineer...  ...partner closely with senior SREs,... 
    Senior
    Temporary work

    Dormont Manufacturing Co

    San Francisco, CA
    1 day ago
  • Senior Site Reliability Engineer - AI Infrastructure Location: Global Remote / San Francisco • Full-Time About Andromeda Andromeda Cluster was founded...  ...Andromeda works with leading AI labs, data centers, and cloud providers to deliver compute when and where it’s needed... 
    Senior
    Full time
    Remote work

    Cortes 23

    San Francisco, CA
    4 days ago
  • $220k - $235k

     ...strategic, high-output Staff/Senior Staff SRE to define the future of our cloud platform and champion engineering excellence across Ironclad....  ...direction for the Site Reliability Engineering team and our broader...  ...ArgoCD Experience with modern AI enabled tools such as... 
    Senior
    Full time
    Work at office

    jobr.pro

    San Francisco, CA
    1 day ago
  • $151.5k - $252.5k

    A leading technology firm is seeking a Senior Site Reliability Engineer to join their Data Cloud engineering team in San Francisco. The role requires expertise in Azure infrastructure and SaaS applications, focusing on building reliable, scalable systems. The ideal candidate... 
    Senior

    Veeam

    San Francisco, CA
    5 days ago
  • $181k - $263k

    ## Senior Staff Site Reliability EngineerApplylocations: San Franciscotime type: Full...  ...Staff Site Reliability Engineer who will set the technical...  ...strategy across Kubernetes, cloud resources, and database infrastructure...  ...Familiarity with LLMs and AI-assisted development... 
    Senior
    Work from home
    Flexible hours
    Night shift

    LiveRamp

    San Francisco, CA
    4 days ago
  •  ...Alembic is the pioneering Causal AI platform. We help the world's...  ...perform under real-world scale, reliability, and security demands — and we're looking for an engineer who wants to own the foundation...  ...across our network and cloud infrastructure. Partner across... 
    Senior

    Mxv

    San Francisco, CA
    4 days ago
  • $232k - $319k

     ...Secure Every Identity, from AI to Human Identity is the key...  ...service with great people and reliable, cost-effective, and efficient...  ...partnership with architects and product engineering Build a world-class...  ...of scalable, self-service Cloud infrastructure platforms (e.g.... 
    Senior
    Permanent employment
    Local area
    Worldwide
    Flexible hours

    Okta

    San Francisco, CA
    more than 2 months ago
  •  ...Udaip Cloud-Based Data And Ai Platform Engineer At U.S. Bank, we're on a journey to do our best. Helping the customers and businesses we serve to make better and smarter financial decisions and enabling the communities we support to grow and succeed. We believe it takes... 
    Senior
    Temporary work
    Work experience placement

    Phenom People

    San Francisco, CA
    3 days ago
  • $261k - $326k

    A technology company specializing in AI infrastructure is seeking a Principal Engineer to enhance reliability and scalability of cloud systems. This role demands over 15 years of experience in production engineering or related fields and involves setting technical directions... 
    Senior

    Crusoe

    San Francisco, CA
    4 days ago
  • Drata is seeking a Senior Site Reliability Engineer in San Francisco. In this role, you will engage in reliability architecture for product teams,...  ...ideal candidate has at least 6 years of experience in SRE or Cloud Engineering, expertise in Terraform and Datadog, and is... 
    Senior

    Careers at Drata

    San Francisco, CA
    5 days ago
  •  ...building the category-defining AI workflow automation platform that...  ...We’re hiring an SRE to join our engineering team at Plenful and take ownership of the reliability and performance of the systems that...  ...operating production systems in cloud environments, ideally AWS. ~... 
    Work at office
    Remote work
    Flexible hours
    2 days per week

    Plenful

    San Francisco, CA
    2 days ago

Do you want to receive more vacancies?

Subscribe and receive similar vacancies to Senior Site Reliability Engineer — AI Cloud Reliability. Be the first to apply!