Sign up to access all features of our service.
  • Job search
  • Favorites
  • Create a CV
    New
  • Salaries
  • Subscriptions

Site Reliability Engineer - Scale & Observability

gamma.app

A dynamic tech firm located in San Francisco is seeking a Site Reliability Engineer to enhance operational health across their production systems. This high-impact role demands expertise in AWS and strong programming skills. You will manage production systems' reliability and lead incident response efforts to prevent issues, all while contributing to the scalability and efficiency of their services. Ideal candidates will have 5+ years of relevant experience and a passion for leveraging technology to drive outcomes. #J-18808-Ljbffr gamma.app

Vacancy posted 2 days ago
Similar jobs that could be interesting for youBased on the Site Reliability Engineer - Scale & Observability in San Francisco, CA vacancy
  • A leading AI research company based in San Francisco is seeking experienced reliability engineers to scale their infrastructure and ensure system performance and reliability. This role involves collaborating with diverse teams to develop resilient systems and enhance operations... 
    Suggested

    OpenAI

    San Francisco, CA
    4 days ago
  • Happyrobot Inc. is looking for an Infrastructure Engineer in San Francisco, California. This role involves leading the stability and observability of systems while debugging complex issues as they arise. Candidates should have over 3 years of experience with production... 
    Suggested

    Happyrobot Inc.

    San Francisco, CA
    4 days ago
  •  ...’re hiring an SRE to join our engineering team at Plenful and take ownership of the reliability and performance of the systems...  ...will influence how we build, scale and operate our platform as we...  ...What you’ll do Reliability, Observability and Performance: Maintain and... 
    Suggested
    Work at office
    Remote work
    Flexible hours
    2 days per week

    Plenful

    San Francisco, CA
    2 days ago
  • $194k - $267k

     ...are too, let's talk. We are seeking a highly technical Observability Site Reliability Engineer with a specialty in Google Cloud, to own and expand...  ...Automation: Eliminate "toil" by automating the deployment and scaling of observability agents and collectors. Required... 
    Suggested
    Permanent employment
    Local area
    Worldwide
    Flexible hours

    Okta

    San Francisco, CA
    16 days ago
  • $177.19k - $364.8k

    Pinterest is seeking a Staff Software Engineer to join the Observability team. This role involves designing and building observability solutions while collaborating with various teams. Ideal candidates will have over 7 years of experience in distributed systems, a Bachelor... 
    Suggested
    Work at office

    jobr.pro

    San Francisco, CA
    4 days ago
  • $170k - $230k

     ...Site Reliability Engineer (SRE) Palo Alto / San Francisco Bay Area About Mithril Mithril is...  ...infrastructure hire that will shape how Mithril scales its platform across a heterogeneous,...  ...— you will build the automation, observability, and tooling that allows Mithril to... 
    Work at office
    Local area
    1 day per week

    Mithril

    San Francisco, CA
    5 days ago
  • $106k - $130k

     ...infrastructure and to be responsible for reliability, automation and scalability using and...  ...scenarios. Implement and evangelize Observability and monitoring systems to proactively detect...  ...and recommend an efficient path to scale for future needs. Identify... 
    Hourly pay
    Work experience placement
    Work at office
    Immediate start
    Visa sponsorship
    Work visa
    Flexible hours

    Early Warning Services

    San Francisco, CA
    2 days ago
  •  ...About the Role We're seeking a Site Reliability Engineer to ensure Hyperbolic's GPU marketplace...  ...affordable, accessible AI compute at scale. Who You Are Architected,...  ...rollback mechanisms Proficient in observability tools and practices including metrics... 

    Hyperbolic Labs

    San Francisco, CA
    3 days ago
  •  ...significantly outperforms individual engineers. We combine language models...  ...are seeking an experienced Site Reliability Engineer to join our...  ...to deploy, monitor, and scale our services reliably. As...  ...monitoring, alerting, and observability solutions using Datadog and... 

    CodeRabbit

    San Francisco, CA
    5 days ago
  •  ...in the systems they read from, not just observe them. We started as the open-source...  ...proved the economics of data integration at scale: hundreds of connectors, thousands of...  ...You'll be the infrastructure and reliability engineer on the Data Replication team - a full-stack... 
    Work at office
    Local area
    Remote work
    Flexible hours

    Airbyte

    San Francisco, CA
    3 days ago
  •  ...workflows. We're actively hiring as we continue to scale. About the role We're hiring a Site Reliability Engineer (SRE) to ensure the reliability, performance,...  ...to reduce MTTR (Mean Time to Recovery) Observability & System Insight Design and evolve observability... 
    Work at office
    Remote work
    Flexible hours
    2 days per week

    Plenful

    San Francisco, CA
    3 days ago
  • $170k - $250k

     ...Site Reliability Engineer (SRE) Location: San Francisco, CA / Palo Alto, CA Company Stage of...  ...engineering to build the automation, observability, and platform infrastructure that powers...  ...their multi-cloud GPU marketplace at scale. What You Will Do Design,... 
    Work at office
    Visa sponsorship
    Flexible hours

    Recruiting from Scratch

    San Francisco, CA
    5 days ago
  • $150k

     ...Site Reliability Engineer San Francisco, CA About The Role We are seeking an experienced Site...  ...alerting and dashboards using observability tooling (e.g., CloudWatch, Datadog, Grafana...  ...and vulnerability remediation at scale, including OS-level patching (Amazon... 

    VantageScore®

    San Francisco, CA
    4 days ago
  •  ...in San Francisco seeks infrastructure engineers to enhance the tooling and systems...  ...include building GPU orchestration, scaling cloud batchjob systems, and designing...  ...infrastructure and a strong focus on reliability and observability. This position is in-person, and international... 
    Visa sponsorship

    Exa

    San Francisco, CA
    6 days ago
  •  ...Francisco is seeking a skilled Research Engineer to ensure the reliability and infrastructure integrity of AI...  ...with the ability to enhance observability across training environments. The role...  ...maintain system stability as demand scales. #J-18808-Ljbffr United States... 

    United States Digital Space LLC

    San Francisco, CA
    4 days ago
  •  ...Udaip Cloud-Based Data And Ai Platform Engineer At U.S. Bank, we're on a journey to...  ...maintain automation scripts for deployment, scaling, and monitoring Collaborate with...  ...Docker Containers, and Splunk Logging & Observability Experience with Linux... 
    Temporary work
    Work experience placement

    Phenom People

    San Francisco, CA
    3 days ago
  •  ...more information, please read ourSenior Site Reliability Engineer page is loaded## Senior Site...  ...teams to ensure systems are resilient (observable, fault-tolerant, recoverable, scalable...  ...organizations to build, deploy, and scale the next generation of enterprise software... 
    Immediate start
    Remote work
    Worldwide

    OutSystems Inc.

    San Francisco, CA
    4 days ago
  • What you’ll do As a Senior Site Reliability Engineer, you’ll work closely with product teams in Spend...  ...readiness. Lead incident response, observability, and automation across critical systems...  ...Able to lead SRE strategy for large‑scale, cross‑functional projects. Strong... 

    Airwallex-

    San Francisco, CA
    3 days ago
  • $117k - $209.33k

     ...help make a better world? As a Senior Site Reliability Engineer at Autodesk, you will build and...  ...monitoring, alerting, logging, tracing, and observability capabilities across supported...  ..., runbooks, and recovery procedures. Scale and enhance resilience testing and Gameday... 

    Autodesk, Inc.

    San Francisco, CA
    2 days ago
  • $151.5k - $252.5k

     ...enable the acceleration of safe AI at scale. As the market leader in both data...  ...are looking for an experienced Senior Site Reliability Engineer to join the Veeam Data Cloud (VDC) engineering...  ..., the Serverless Framework, etc.) Observability (Azure Monitor, AppInsights, Elastic... 
    Base plus commission
    Local area
    Worldwide

    Veeam

    San Francisco, CA
    5 days ago
  •  ...Connor was a machine learning research engineer at Scale AI. The rest of our team comes from...  ...Senior SRE, you'll tackle the scaling and reliability challenges that come with adding...  ...services, and building the automation and observability that keep Unify fast and reliable at... 

    Unify

    San Francisco, CA
    4 days ago
  •  ...onboard services and teams to the reliability tenets. Establish and...  ...development teams to build resilient, observable, fault‑tolerant, recoverable...  .... 6+ years of experience in Site Reliability Engineering, managing infrastructure and services at scale. History of end‑to‑end... 

    OutSystems, Inc.

    San Francisco, CA
    4 days ago
  •  ...We're looking for an Infrastructure Engineer to take the lead on scaling our operational resilience as we grow. You’ll own the stability, observability, and debugging workflows that keep...  ...-trust role where you’ll shape how reliability is done - reducing incident load, building... 
    Worldwide
    Shift work

    Happyrobot Inc.

    San Francisco, CA
    4 days ago
  • $166.9k - $225.9k

     ...operates as both a central engineering function and an embedded reliability practice. You'll be part...  ...stack to help Drata scale reliably for a rapidly growing...  ...—SLO templates, observability checklists, alerting standards...  ...years of experience in Site Reliability Engineering,... 
    Flexible hours

    Drata

    San Francisco, CA
    5 days ago
  •  ...role We're looking for a world-class Site Reliability Engineer to ensure the reliability, performance...  ...systems that power agentic AI at scale. Your mission: keep our ultra-low-latency...  ...our reliability posture end-to-end—observability, performance tuning, incident ops, infrastructure... 

    Blaxel

    San Francisco, CA
    5 days ago
  •  ...users while enabling our engineering teams to ship fast....  ...tooling that improves reliability and partnering with engineering...  ...systems that are observable, resilient, and easy...  ...help shape how Gamma scales to serve its next 100...  ...ll bring 5+ years in Site Reliability Engineering... 
    Work at office
    Work from home

    gamma.app

    San Francisco, CA
    2 days ago
  • Sr. Site Reliability Engineer Job type: Full Time · Department: Platform · Work type: On-Site San Francisco...  ...AI pilots into a unified, enterprise-scale program that delivers measurable...  ..., while providing the control and observability needed to scale safely. Built for real... 
    Full time
    Remote work

    Neara

    San Francisco, CA
    5 days ago
  • $60 per hour

    Senior Site Reliability Engineer (Copy) Seattle Hybrid (Hybrid location). Full-time. About Us Supio...  ...bringing total funding to $91 M. We’re scaling rapidly and looking for exceptional...  ...coordination. Build safe, repeatable, and observable workflows. GitHub Operations: Manage... 
    Full time
    Work at office
    Flexible hours

    Bonfirevc

    San Francisco, CA
    4 days ago
  • # Senior Site Reliability EngineerHybrid - San Francisco**Our Mission &...  ...operates as both a central engineering function and an embedded reliability...  ...native stack to help Drata scale reliably for a rapidly...  ...artifacts - SLO templates, observability checklists, alerting... 
    Work at office
    Immediate start
    Worldwide
    Monday to Friday
    Flexible hours

    Careers at Drata

    San Francisco, CA
    5 days ago
  • $140k - $220k

    About the Job You’ll own reliability and operational excellence for...  ...incident response processes that scale as we grow. You'll build tooling that makes the entire engineering team more effective,...  ...implement monitoring, alerting, and observability systems from first... 

    Pylon

    San Francisco, CA
    2 days ago

Do you want to receive more vacancies?

Subscribe and receive similar vacancies to Site Reliability Engineer - Scale & Observability. Be the first to apply!