Site Reliability Engineer - Scale & Observability
gamma.app
A dynamic tech firm located in San Francisco is seeking a Site Reliability Engineer to enhance operational health across their production systems. This high-impact role demands expertise in AWS and strong programming skills. You will manage production systems' reliability and lead incident response efforts to prevent issues, all while contributing to the scalability and efficiency of their services. Ideal candidates will have 5+ years of relevant experience and a passion for leveraging technology to drive outcomes. #J-18808-Ljbffr gamma.app
- A leading AI research company based in San Francisco is seeking experienced reliability engineers to scale their infrastructure and ensure system performance and reliability. This role involves collaborating with diverse teams to develop resilient systems and enhance operations...Suggested
- Happyrobot Inc. is looking for an Infrastructure Engineer in San Francisco, California. This role involves leading the stability and observability of systems while debugging complex issues as they arise. Candidates should have over 3 years of experience with production...Suggested
- ...’re hiring an SRE to join our engineering team at Plenful and take ownership of the reliability and performance of the systems... ...will influence how we build, scale and operate our platform as we... ...What you’ll do Reliability, Observability and Performance: Maintain and...SuggestedWork at officeRemote workFlexible hours2 days per week
$194k - $267k
...are too, let's talk. We are seeking a highly technical Observability Site Reliability Engineer with a specialty in Google Cloud, to own and expand... ...Automation: Eliminate "toil" by automating the deployment and scaling of observability agents and collectors. Required...SuggestedPermanent employmentLocal areaWorldwideFlexible hours$177.19k - $364.8k
Pinterest is seeking a Staff Software Engineer to join the Observability team. This role involves designing and building observability solutions while collaborating with various teams. Ideal candidates will have over 7 years of experience in distributed systems, a Bachelor...SuggestedWork at office$170k - $230k
...Site Reliability Engineer (SRE) Palo Alto / San Francisco Bay Area About Mithril Mithril is... ...infrastructure hire that will shape how Mithril scales its platform across a heterogeneous,... ...— you will build the automation, observability, and tooling that allows Mithril to...Work at officeLocal area1 day per week$106k - $130k
...infrastructure and to be responsible for reliability, automation and scalability using and... ...scenarios. Implement and evangelize Observability and monitoring systems to proactively detect... ...and recommend an efficient path to scale for future needs. Identify...Hourly payWork experience placementWork at officeImmediate startVisa sponsorshipWork visaFlexible hours- ...About the Role We're seeking a Site Reliability Engineer to ensure Hyperbolic's GPU marketplace... ...affordable, accessible AI compute at scale. Who You Are Architected,... ...rollback mechanisms Proficient in observability tools and practices including metrics...
- ...significantly outperforms individual engineers. We combine language models... ...are seeking an experienced Site Reliability Engineer to join our... ...to deploy, monitor, and scale our services reliably. As... ...monitoring, alerting, and observability solutions using Datadog and...
- ...in the systems they read from, not just observe them. We started as the open-source... ...proved the economics of data integration at scale: hundreds of connectors, thousands of... ...You'll be the infrastructure and reliability engineer on the Data Replication team - a full-stack...Work at officeLocal areaRemote workFlexible hours
- ...workflows. We're actively hiring as we continue to scale. About the role We're hiring a Site Reliability Engineer (SRE) to ensure the reliability, performance,... ...to reduce MTTR (Mean Time to Recovery) Observability & System Insight Design and evolve observability...Work at officeRemote workFlexible hours2 days per week
$170k - $250k
...Site Reliability Engineer (SRE) Location: San Francisco, CA / Palo Alto, CA Company Stage of... ...engineering to build the automation, observability, and platform infrastructure that powers... ...their multi-cloud GPU marketplace at scale. What You Will Do Design,...Work at officeVisa sponsorshipFlexible hours$150k
...Site Reliability Engineer San Francisco, CA About The Role We are seeking an experienced Site... ...alerting and dashboards using observability tooling (e.g., CloudWatch, Datadog, Grafana... ...and vulnerability remediation at scale, including OS-level patching (Amazon...- ...in San Francisco seeks infrastructure engineers to enhance the tooling and systems... ...include building GPU orchestration, scaling cloud batchjob systems, and designing... ...infrastructure and a strong focus on reliability and observability. This position is in-person, and international...Visa sponsorship
- ...Francisco is seeking a skilled Research Engineer to ensure the reliability and infrastructure integrity of AI... ...with the ability to enhance observability across training environments. The role... ...maintain system stability as demand scales. #J-18808-Ljbffr United States...
- ...Udaip Cloud-Based Data And Ai Platform Engineer At U.S. Bank, we're on a journey to... ...maintain automation scripts for deployment, scaling, and monitoring Collaborate with... ...Docker Containers, and Splunk Logging & Observability Experience with Linux...Temporary workWork experience placement
- ...more information, please read ourSenior Site Reliability Engineer page is loaded## Senior Site... ...teams to ensure systems are resilient (observable, fault-tolerant, recoverable, scalable... ...organizations to build, deploy, and scale the next generation of enterprise software...Immediate startRemote workWorldwide
- What you’ll do As a Senior Site Reliability Engineer, you’ll work closely with product teams in Spend... ...readiness. Lead incident response, observability, and automation across critical systems... ...Able to lead SRE strategy for large‑scale, cross‑functional projects. Strong...
$117k - $209.33k
...help make a better world? As a Senior Site Reliability Engineer at Autodesk, you will build and... ...monitoring, alerting, logging, tracing, and observability capabilities across supported... ..., runbooks, and recovery procedures. Scale and enhance resilience testing and Gameday...$151.5k - $252.5k
...enable the acceleration of safe AI at scale. As the market leader in both data... ...are looking for an experienced Senior Site Reliability Engineer to join the Veeam Data Cloud (VDC) engineering... ..., the Serverless Framework, etc.) Observability (Azure Monitor, AppInsights, Elastic...Base plus commissionLocal areaWorldwide- ...Connor was a machine learning research engineer at Scale AI. The rest of our team comes from... ...Senior SRE, you'll tackle the scaling and reliability challenges that come with adding... ...services, and building the automation and observability that keep Unify fast and reliable at...
- ...onboard services and teams to the reliability tenets. Establish and... ...development teams to build resilient, observable, fault‑tolerant, recoverable... .... 6+ years of experience in Site Reliability Engineering, managing infrastructure and services at scale. History of end‑to‑end...
- ...We're looking for an Infrastructure Engineer to take the lead on scaling our operational resilience as we grow. You’ll own the stability, observability, and debugging workflows that keep... ...-trust role where you’ll shape how reliability is done - reducing incident load, building...WorldwideShift work
$166.9k - $225.9k
...operates as both a central engineering function and an embedded reliability practice. You'll be part... ...stack to help Drata scale reliably for a rapidly growing... ...—SLO templates, observability checklists, alerting standards... ...years of experience in Site Reliability Engineering,...Flexible hours- ...role We're looking for a world-class Site Reliability Engineer to ensure the reliability, performance... ...systems that power agentic AI at scale. Your mission: keep our ultra-low-latency... ...our reliability posture end-to-end—observability, performance tuning, incident ops, infrastructure...
- ...users while enabling our engineering teams to ship fast.... ...tooling that improves reliability and partnering with engineering... ...systems that are observable, resilient, and easy... ...help shape how Gamma scales to serve its next 100... ...ll bring 5+ years in Site Reliability Engineering...Work at officeWork from home
- Sr. Site Reliability Engineer Job type: Full Time · Department: Platform · Work type: On-Site San Francisco... ...AI pilots into a unified, enterprise-scale program that delivers measurable... ..., while providing the control and observability needed to scale safely. Built for real...Full timeRemote work
$60 per hour
Senior Site Reliability Engineer (Copy) Seattle Hybrid (Hybrid location). Full-time. About Us Supio... ...bringing total funding to $91 M. We’re scaling rapidly and looking for exceptional... ...coordination. Build safe, repeatable, and observable workflows. GitHub Operations: Manage...Full timeWork at officeFlexible hours- # Senior Site Reliability EngineerHybrid - San Francisco**Our Mission &... ...operates as both a central engineering function and an embedded reliability... ...native stack to help Drata scale reliably for a rapidly... ...artifacts - SLO templates, observability checklists, alerting...Work at officeImmediate startWorldwideMonday to FridayFlexible hours
$140k - $220k
About the Job You’ll own reliability and operational excellence for... ...incident response processes that scale as we grow. You'll build tooling that makes the entire engineering team more effective,... ...implement monitoring, alerting, and observability systems from first...
Do you want to receive more vacancies?
Subscribe and receive similar vacancies to Site Reliability Engineer - Scale & Observability. Be the first to apply!
- site reliability engineer remote San Francisco, CA
- site reliability engineer sre San Francisco, CA
- site reliability engineer San Francisco, CA
- on-site clinical research associate (traveling/remote) San Francisco, CA
- junior website developer San Francisco, CA
- site merchandiser San Francisco, CA
- IT site lead San Francisco, CA
- site leader San Francisco, CA
- site safety San Francisco, CA
- site recruiter San Francisco, CA


