Site Reliability Engineer - Scale & Observability

gamma.app

A dynamic tech firm located in San Francisco is seeking a Site Reliability Engineer to enhance operational health across their production systems. This high-impact role demands expertise in AWS and strong programming skills. You will manage production systems' reliability and lead incident response efforts to prevent issues, all while contributing to the scalability and efficiency of their services. Ideal candidates will have 5+ years of relevant experience and a passion for leveraging technology to drive outcomes. #J-18808-Ljbffr gamma.app

Apply

Vacancy posted 2 days ago

Similar jobs that could be interesting for youBased on the Site Reliability Engineer - Scale & Observability in San Francisco, CA vacancy

Reliability Engineer: Scale Systems, Observe & Automate
A leading AI research company based in San Francisco is seeking experienced reliability engineers to scale their infrastructure and ensure system performance and reliability. This role involves collaborating with diverse teams to develop resilient systems and enhance operations...
Suggested
OpenAI
San Francisco, CA
4 days ago
Site Reliability Engineer — Scale AI Infra with Ownership
Happyrobot Inc. is looking for an Infrastructure Engineer in San Francisco, California. This role involves leading the stability and observability of systems while debugging complex issues as they arise. Candidates should have over 3 years of experience with production...
Suggested
Happyrobot Inc.
San Francisco, CA
4 days ago
Site Reliability Engineer
...’re hiring an SRE to join our engineering team at Plenful and take ownership of the reliability and performance of the systems... ...will influence how we build, scale and operate our platform as we... ...What you’ll do Reliability, Observability and Performance: Maintain and...
Suggested
Work at office
Remote work
Flexible hours
2 days per week
Plenful
San Francisco, CA
2 days ago
Staff Site Reliability Engineer - Observability GCP
$194k - $267k
...are too, let's talk. We are seeking a highly technical Observability Site Reliability Engineer with a specialty in Google Cloud, to own and expand... ...Automation: Eliminate "toil" by automating the deployment and scaling of observability agents and collectors. Required...
Suggested
Permanent employment
Local area
Worldwide
Flexible hours
Okta
San Francisco, CA
16 days ago
Staff Software Engineer, Observability — Scale & Impact (Equity)
$177.19k - $364.8k
Pinterest is seeking a Staff Software Engineer to join the Observability team. This role involves designing and building observability solutions while collaborating with various teams. Ideal candidates will have over 7 years of experience in distributed systems, a Bachelor...
Suggested
Work at office
jobr.pro
San Francisco, CA
4 days ago
Site Reliability Engineer (SRE)
$170k - $230k
...Site Reliability Engineer (SRE) Palo Alto / San Francisco Bay Area About Mithril Mithril is... ...infrastructure hire that will shape how Mithril scales its platform across a heterogeneous,... ...— you will build the automation, observability, and tooling that allows Mithril to...
Work at office
Local area
1 day per week
Mithril
San Francisco, CA
5 days ago
Sr. Site Reliability Engineer
$106k - $130k
...infrastructure and to be responsible for reliability, automation and scalability using and... ...scenarios. Implement and evangelize Observability and monitoring systems to proactively detect... ...and recommend an efficient path to scale for future needs. Identify...
Hourly pay
Work experience placement
Work at office
Immediate start
Visa sponsorship
Work visa
Flexible hours
Early Warning Services
San Francisco, CA
2 days ago
Senior Site Reliability Engineer
...About the Role We're seeking a Site Reliability Engineer to ensure Hyperbolic's GPU marketplace... ...affordable, accessible AI compute at scale. Who You Are Architected,... ...rollback mechanisms Proficient in observability tools and practices including metrics...
Hyperbolic Labs
San Francisco, CA
3 days ago
Senior Site Reliability Engineer
...significantly outperforms individual engineers. We combine language models... ...are seeking an experienced Site Reliability Engineer to join our... ...to deploy, monitor, and scale our services reliably. As... ...monitoring, alerting, and observability solutions using Datadog and...
CodeRabbit
San Francisco, CA
5 days ago
Senior Site Reliability Engineer
...in the systems they read from, not just observe them. We started as the open-source... ...proved the economics of data integration at scale: hundreds of connectors, thousands of... ...You'll be the infrastructure and reliability engineer on the Data Replication team - a full-stack...
Work at office
Local area
Remote work
Flexible hours
Airbyte
San Francisco, CA
3 days ago
Site Reliability Engineer
...workflows. We're actively hiring as we continue to scale. About the role We're hiring a Site Reliability Engineer (SRE) to ensure the reliability, performance,... ...to reduce MTTR (Mean Time to Recovery) Observability & System Insight Design and evolve observability...
Work at office
Remote work
Flexible hours
2 days per week
Plenful
San Francisco, CA
3 days ago
Site Reliability Engineer (SRE)
$170k - $250k
...Site Reliability Engineer (SRE) Location: San Francisco, CA / Palo Alto, CA Company Stage of... ...engineering to build the automation, observability, and platform infrastructure that powers... ...their multi-cloud GPU marketplace at scale. What You Will Do Design,...
Work at office
Visa sponsorship
Flexible hours
Recruiting from Scratch
San Francisco, CA
5 days ago
Site Reliability Engineer
$150k
...Site Reliability Engineer San Francisco, CA About The Role We are seeking an experienced Site... ...alerting and dashboards using observability tooling (e.g., CloudWatch, Datadog, Grafana... ...and vulnerability remediation at scale, including OS-level patching (Amazon...
VantageScore®
San Francisco, CA
4 days ago
GPU Infra Engineer: Scale Massive Clusters & Observability
...in San Francisco seeks infrastructure engineers to enhance the tooling and systems... ...include building GPU orchestration, scaling cloud batchjob systems, and designing... ...infrastructure and a strong focus on reliability and observability. This position is in-person, and international...
Visa sponsorship
Exa
San Francisco, CA
6 days ago
RL Infra Engineer - Reliability, Observability & Scale
...Francisco is seeking a skilled Research Engineer to ensure the reliability and infrastructure integrity of AI... ...with the ability to enhance observability across training environments. The role... ...maintain system stability as demand scales. #J-18808-Ljbffr United States...
United States Digital Space LLC
San Francisco, CA
4 days ago
Senior Software Engineer - Site Reliability Engineering
...Udaip Cloud-Based Data And Ai Platform Engineer At U.S. Bank, we're on a journey to... ...maintain automation scripts for deployment, scaling, and monitoring Collaborate with... ...Docker Containers, and Splunk Logging & Observability Experience with Linux...
Temporary work
Work experience placement
Phenom People
San Francisco, CA
3 days ago
Senior Site Reliability Engineer
...more information, please read ourSenior Site Reliability Engineer page is loaded## Senior Site... ...teams to ensure systems are resilient (observable, fault-tolerant, recoverable, scalable... ...organizations to build, deploy, and scale the next generation of enterprise software...
Immediate start
Remote work
Worldwide
OutSystems Inc.
San Francisco, CA
4 days ago
Senior Site Reliability Engineer, Spend
What you’ll do As a Senior Site Reliability Engineer, you’ll work closely with product teams in Spend... ...readiness. Lead incident response, observability, and automation across critical systems... ...Able to lead SRE strategy for large‑scale, cross‑functional projects. Strong...
Airwallex-
San Francisco, CA
3 days ago
Senior Site Reliability Engineer
$117k - $209.33k
...help make a better world? As a Senior Site Reliability Engineer at Autodesk, you will build and... ...monitoring, alerting, logging, tracing, and observability capabilities across supported... ..., runbooks, and recovery procedures. Scale and enhance resilience testing and Gameday...
Autodesk, Inc.
San Francisco, CA
2 days ago
Site Reliability Engineer III
$151.5k - $252.5k
...enable the acceleration of safe AI at scale. As the market leader in both data... ...are looking for an experienced Senior Site Reliability Engineer to join the Veeam Data Cloud (VDC) engineering... ..., the Serverless Framework, etc.) Observability (Azure Monitor, AppInsights, Elastic...
Base plus commission
Local area
Worldwide
Veeam
San Francisco, CA
5 days ago
Senior Site Reliability Engineer
...Connor was a machine learning research engineer at Scale AI. The rest of our team comes from... ...Senior SRE, you'll tackle the scaling and reliability challenges that come with adding... ...services, and building the automation and observability that keep Unify fast and reliable at...
Unify
San Francisco, CA
4 days ago
Senior Site Reliability Engineer
...onboard services and teams to the reliability tenets. Establish and... ...development teams to build resilient, observable, fault‑tolerant, recoverable... .... 6+ years of experience in Site Reliability Engineering, managing infrastructure and services at scale. History of end‑to‑end...
OutSystems, Inc.
San Francisco, CA
4 days ago
Site Reliability Engineer
...We're looking for an Infrastructure Engineer to take the lead on scaling our operational resilience as we grow. You’ll own the stability, observability, and debugging workflows that keep... ...-trust role where you’ll shape how reliability is done - reducing incident load, building...
Worldwide
Shift work
Happyrobot Inc.
San Francisco, CA
4 days ago
Senior Site Reliability Engineer
$166.9k - $225.9k
...operates as both a central engineering function and an embedded reliability practice. You'll be part... ...stack to help Drata scale reliably for a rapidly growing... ...—SLO templates, observability checklists, alerting standards... ...years of experience in Site Reliability Engineering,...
Flexible hours
Drata
San Francisco, CA
5 days ago
Site Reliability Engineer
...role We're looking for a world-class Site Reliability Engineer to ensure the reliability, performance... ...systems that power agentic AI at scale. Your mission: keep our ultra-low-latency... ...our reliability posture end-to-end—observability, performance tuning, incident ops, infrastructure...
Blaxel
San Francisco, CA
5 days ago
Site Reliability Engineer
...users while enabling our engineering teams to ship fast.... ...tooling that improves reliability and partnering with engineering... ...systems that are observable, resilient, and easy... ...help shape how Gamma scales to serve its next 100... ...ll bring 5+ years in Site Reliability Engineering...
Work at office
Work from home
gamma.app
San Francisco, CA
2 days ago
Sr. Site Reliability Engineer
Sr. Site Reliability Engineer Job type: Full Time · Department: Platform · Work type: On-Site San Francisco... ...AI pilots into a unified, enterprise-scale program that delivers measurable... ..., while providing the control and observability needed to scale safely. Built for real...
Full time
Remote work
Neara
San Francisco, CA
5 days ago
Senior Site Reliability Engineer
$60 per hour
Senior Site Reliability Engineer (Copy) Seattle Hybrid (Hybrid location). Full-time. About Us Supio... ...bringing total funding to $91 M. We’re scaling rapidly and looking for exceptional... ...coordination. Build safe, repeatable, and observable workflows. GitHub Operations: Manage...
Full time
Work at office
Flexible hours
Bonfirevc
San Francisco, CA
4 days ago
Senior Site Reliability Engineer
# Senior Site Reliability EngineerHybrid - San Francisco**Our Mission &... ...operates as both a central engineering function and an embedded reliability... ...native stack to help Drata scale reliably for a rapidly... ...artifacts - SLO templates, observability checklists, alerting...
Work at office
Immediate start
Worldwide
Monday to Friday
Flexible hours
Careers at Drata
San Francisco, CA
5 days ago
Senior Site Reliability Engineer
$140k - $220k
About the Job You’ll own reliability and operational excellence for... ...incident response processes that scale as we grow. You'll build tooling that makes the entire engineering team more effective,... ...implement monitoring, alerting, and observability systems from first...
Pylon
San Francisco, CA
2 days ago

Do you want to receive more vacancies?

Subscribe and receive similar vacancies to Site Reliability Engineer - Scale & Observability. Be the first to apply!