Sign up to access all features of our service.
  • Job search
  • Favorites
  • Create a CV
    New
  • Salaries
  • Subscriptions

Lead Site Reliability Engineer

$200k - $275k

Stuut

Job Description

Job Description

Stuut is transforming accounts receivable for B2B companies—making collections smarter and faster for companies that have historically relied on manual processes that are labor intensive and costly. Our platform is gaining traction with finance teams across industrials, chemicals, and manufacturing sectors from Fortune 10 brands to scaling midmarkets. We're backed by top-tier investors including a16z, Khosla, Activant, 1984 Ventures and Page One.

The Role

We’re hiring a Lead Site Reliability Engineer to drive the strategy, architecture, and execution of reliability, scalability, and operational excellence across our platform. You’ll build and scale the systems that keep Stuut highly available, performant, and resilient as we grow customers, traffic, and complexity.

From defining SLOs and reliability standards to hardening infrastructure, improving observability, and guiding teams through incident response and postmortems, you’ll own the engineering rigor that allows us to ship quickly without sacrificing stability . You’ll turn strong reliability engineering into real customer trust — creating the guardrails that let product and engineering move fast with confidence.

This is a hands-on technical leadership role for an engineer who excels at designing reliable distributed systems, influencing engineering practices, and leading high-impact reliability initiatives across teams.

What You’ll Do

  • Set the Reliability Strategy: define the long-term vision for site reliability, including SLOs/SLIs, error budgets, availability targets, and operational standards.

  • Build & Scale Reliable Infrastructure: architect and maintain resilient, scalable cloud infrastructure across AWS and Kubernetes, ensuring systems are secure, fault-tolerant, and cost-effective.

  • Own Observability & Monitoring: design and evolve monitoring, alerting, and logging systems that provide clear, actionable signals across services and environments.

  • Lead Incident Response & Postmortems: own incident management practices, lead major incident response, and drive blameless postmortems that result in meaningful system improvements.

  • Improve System Resilience: identify reliability risks and lead efforts around redundancy, failover, capacity planning, and graceful degradation.

  • Optimize CI/CD & Deployment Reliability: partner with engineering teams to ensure deployments are safe, observable, and reversible; improve rollout strategies and reduce operational risk.

  • Partner with Product & Engineering Teams: collaborate early in the development lifecycle to influence system design, scalability, and reliability tradeoffs.

  • Reduce Toil & Improve Developer Experience: automate operational tasks, improve runbooks, and build tooling that reduces manual work and accelerates safe execution.

  • Drive Root Cause Resolution : guide teams through deep debugging of reliability issues, ensuring fixes address underlying causes rather than symptoms.

  • Influence Reliability Culture: promote reliability-first thinking, strong operational hygiene, and shared ownership of production systems across engineering.

  • Mentor & Level Up the Team: coach engineers on reliability principles, incident handling, infrastructure design, and operational best practices.

You Might Be a Fit If You…

  • Have 7+ years of experience in site reliability engineering, infrastructure engineering, or backend software engineering.

  • Have designed and operated highly available, production-grade systems supporting rapid product iteration.

  • Are fluent in Python and/or TypeScript, and comfortable building automation and tooling to support reliability goals.

  • Have a deep experience with AWS, Kubernetes (EKS), Docker, and cloud-native architectures.

  • Have implemented and evolved observability stacks (metrics, logs, traces) and know how to create high-signal alerting.

  • Understand how to design, measure, and enforce SLOs, SLIs, and error budgets.

  • Have supported systems built with modern stacks such as FastAPI, Vue.js, PostgreSQL (RDS), and event-driven architectures.

  • Have improved reliability and operational maturity in environments using CI/CD pipelines, infrastructure as code, and modern deployment workflows.

  • Can balance reliability, velocity, and cost — making pragmatic tradeoffs that serve customers and the business.

  • Enjoy collaborating across Product, Backend, Frontend, and Infrastructure teams to improve system health.

  • Thrive in a role that blends deep technical execution, system design, and leadership influence in a fast-moving environment.

Compensation

  • Top-of-market salary and equity package

  • Benefits (for U.S.-based full-time employees)

  • Medical, dental & vision insurance coverage for you

  • 401(k) & Match

  • Equity

  • Flexible PTO

  • Parental Leave

Compensation Range: $200K - $275K

Vacancy posted 10 days ago
Similar jobs that could be interesting for youBased on the Lead Site Reliability Engineer in San Francisco, CA vacancy
  • Airwallex- is seeking a Senior Site Reliability Engineer in San Francisco, California, to work with product teams to build and maintain robust cloud infrastructure. In this role, you will lead critical infrastructure projects, ensuring the reliability and performance of... 
    Suggested

    Airwallex-

    San Francisco, CA
    2 days ago
  • $150k

     ...Description About The Role We are seeking an experienced Site Reliability Engineer (SRE) with a strong focus on DevSecOps to join our growing...  ...tooling (e.g., CloudWatch, Datadog, Grafana). ~ Lead periodic infrastructure and dependency audits; produce clear... 
    Suggested

    VantageScore

    San Francisco, CA
    25 days ago
  •  ...Job Description Job Description Forhyre is looking for engineers who can bring unique perspectives and innovative ideas to all areas...  ...evangelize cloud best practices while building a culture of reliability and observability Engage in and improve the end to end lifecycle... 
    Suggested

    Forhyre

    San Francisco, CA
    21 days ago
  • $163k - $203k

     ...contributor on the SRE team, responsible for the reliability, scalability, and security of Prosper’s...  .... This is as much of a platform engineering role as it is SRE role — you will maintain...  ...Participate in on-call rotation and lead incident response Build and maintain... 
    Suggested
    Work experience placement
    Work at office
    Local area
    Remote work
    Flexible hours
    2 days per week

    Prosper

    San Francisco, CA
    16 days ago
  •  ...fast-growing, early-stage startup to identify a top-tier Site Reliability Engineer who will play a critical role in scaling and strengthening...  ...Why Join Direct impact on a rapidly growing product serving leading enterprise organizations Opportunity to work alongside... 
    Suggested

    Velia multiservices

    San Francisco, CA
    21 days ago
  • Drata is seeking a Senior Site Reliability Engineer in San Francisco. In this role, you will engage in reliability architecture for product teams, lead production readiness reviews, and build automation around monitoring and alerting. The ideal candidate has at least 6... 

    Careers at Drata

    San Francisco, CA
    3 days ago
  • $175k - $250k

     ...Senior Cloud Infrastructure Engineer Location: San Francisco,...  ...unavailable. Modality: On-Site only. Must live within commuting...  ...this role, you will take the lead on designing, deploying, and...  ..., performance, and reliability across environments. What... 
    Full time
    Remote work
    Relocation
    Relocation package

    The Recruiting Guy

    San Francisco, CA
    3 days ago
  •  ...startups across the US. We’re building a pool of world-class Site Reliability Engineers for current roles and for upcoming opportunities. You will...  ...and local performance testing and track benchmarks. Lead resilience work like failover drills, chaos tests, and redundancy... 
    Local area

    Breakout Tools

    San Francisco, CA
    3 days ago
  •  ...daily users while enabling our engineering teams to ship fast. You'll...  ...automation and tooling that improves reliability and partnering with...  ...prioritize stability. You'll lead incident response, drive systemic...  ...you'll bring ~5+ years in Site Reliability Engineering, DevOps... 
    Work at office
    Work from home

    Gamma

    San Francisco, CA
    3 days ago
  • $210k - $240k

     ...Join to apply for the Senior Site Reliability Engineer role at Alembic Technologies This range is provided by Alembic Technologies. Your actual pay will be based on your skills and experience — talk with your recruiter to learn more. Base pay range $210,000.... 
    Full time

    Alembic Technologies

    San Francisco, CA
    3 days ago
  • $56.25 - $137 per hour

     ...Join to apply for the Site Reliability Engineer - Inference role at Jobright.ai 2 days ago Be among the first 25 applicants Join to apply for the Site Reliability Engineer - Inference role at Jobright.ai Get AI-powered advice on this job and more exclusive... 
    Full time
    Summer work
    Internship
    H1b
    Shift work

    jobright.com

    San Francisco, CA
    3 days ago
  •  ...customer acquisition, and Connor was a machine learning research engineer at Scale AI. The rest of our team comes from companies like...  ...of-the-art AI. As a Senior SRE, you'll tackle the scaling and reliability challenges that come with adding terabytes of data monthly and... 

    Unify

    San Francisco, CA
    3 days ago
  •  ...A high-growth AI startup in San Francisco is seeking a Site Reliability Engineer to lead the scaling of operational resilience. In this role, you will own system stability and debugging workflows while tackling complex failures and enhancing proactive operations. Ideal... 

    Happy Robot

    San Francisco, CA
    3 days ago
  • $80 per hour

     ...Infrastructure Site Reliability Engineer (Local only) Direct message the job poster from Maxonic Inc. Job Description: Job Title: Infrastructure Site Reliability Engineer Job Type: Contract (4+ months) with strong possibility to convert to fulltime Job... 
    Full time
    Contract work
    For contractors
    Local area
    2 days per week

    Maxonic

    San Francisco, CA
    3 days ago
  • $227.2k - $324.5k

     ...About the Role: Site Reliability Engineering (SRE) at Tubi is not a traditional operations team. We are a software engineering organization that...  ...seeking an experienced and visionary Senior SRE Manager to lead and grow our newly built Site Reliability Engineering team.... 
    Full time
    Contract work
    Temporary work
    Local area
    Flexible hours

    Tubi

    San Francisco, CA
    1 day ago
  • $210k - $310k

     ...Director of Site Reliability Engineering Interested in working on cutting-edge blockchain technology and creating equitable access to the global...  ...rapidly growing and changing Stellar ecosystem. You will lead an experienced Site Reliability Engineering team, ensuring... 
    Temporary work
    Work at office
    Local area
    Remote work
    Worldwide
    Flexible hours

    Crypto Pro Network

    San Francisco, CA
    3 days ago
  •  ...cloud-native systems. As a Staff Platform Engineer, you will play a critical role in...  ...technical leadership role. You will own reliability for major platform domains, design scalable...  ...Infrastructure Development, Platform Engineering, or Site Reliability Engineering role, with a... 

    Saviynt

    San Francisco, CA
    24 days ago
  • $60 per hour

    Senior Site Reliability Engineer (Copy) Seattle Hybrid (Hybrid location). Full-time. About Us Supio is a trusted AI platform purpose-built for...  ...shipping process. You’ll work closely with engineers, product leads, and company leadership to ensure uptime, speed, and... 
    Full time
    Work at office
    Flexible hours

    Bonfirevc

    San Francisco, CA
    2 days ago
  • # Senior Site Reliability EngineerHybrid - San Francisco**Our Mission & Values:** At Drata, we...  ...Diversity brings unique perspectives that lead to better solutions. Automation First...  ...'s SRE team operates as both a central engineering function and an embedded reliability practice... 
    Work at office
    Immediate start
    Worldwide
    Monday to Friday
    Flexible hours

    Careers at Drata

    San Francisco, CA
    3 days ago
  • $175k - $250k

     ...a fast‑growing customer base of SaaS companies. About the Site Reliability Engineering Team The Site Reliability Engineering (SRE) team ensures the...  ...degradation Improve our incident response process, lead post‑mortems, and drive follow‑through on reliability risks... 
    Remote work

    I did my part and supported the Regular Toilet

    San Francisco, CA
    2 days ago
  • $165k - $225k

     ...and the SDF team is expanding to support the rapidly growing and changing Stellar ecosystem. SDF is looking for a Senior Site Reliability Engineer to help build and operate the foundation that powers our engineering teams. You’ll ensure the reliability and scalability... 
    Temporary work
    Work at office
    Local area
    Worldwide
    Flexible hours

    Stellar

    San Francisco, CA
    18 hours ago
  •  ...Partners, TQ Ventures, Susa/Kivu Ventures, and other leading investors, we’re building the category-defining...  ...About the role We’re hiring an SRE to join our engineering team at Plenful and take ownership of the reliability and performance of the systems that power our product... 
    Work at office
    Remote work
    Flexible hours
    2 days per week

    Plenful

    San Francisco, CA
    3 days ago
  • $125k - $165k

    Position Site Reliability Engineer Location Lincoln, NE, San Francisco, CA, or Remote Job ID 434 Openings 1 Job Summary The Site Reliability Engineer will help ensure the reliability, scalability, and performance of the systems that power our AI products. This role... 
    Temporary work
    Remote work
    Visa sponsorship
    Work visa
    Flexible hours

    TELCOR Inc

    San Francisco, CA
    18 hours ago
  • OutSystems, Inc. is looking for a Site Reliability Engineer to join their team in San Francisco, CA. The ideal candidate will lead the onboarding of services and teams to reliability tenets while establishing SLOs and SLAs. Proficiency in Python and experience with Kubernetes... 
    Flexible hours

    OutSystems, Inc.

    San Francisco, CA
    2 days ago
  • $151.5k - $252.5k

     .... About The Role We are looking for an experienced Senior Site Reliability Engineer to join the Veeam Data Cloud (VDC) engineering team. You will...  ...Experience with implementation and maintenance of leading infrastructure and application monitoring tools (Azure Monitor... 
    Base plus commission
    Local area
    Worldwide

    Veeam

    San Francisco, CA
    3 days ago
  • $163k - $203k

     ...contributor on the SRE team, responsible for the reliability, scalability, and security of Prosper’s...  ...portfolio. This is as much a platform engineering role as it is an SRE role— you will...  .... Participate in on‑call rotation and lead incident response. Build and maintain observability... 
    Work experience placement
    Work at office
    Remote work
    Flexible hours
    2 days per week

    GoTo Meeting

    San Francisco, CA
    2 days ago
  • Happyrobot Inc. is looking for an Infrastructure Engineer in San Francisco, California. This role involves leading the stability and observability of systems while debugging complex issues as they arise. Candidates should have over 3 years of experience with production... 

    Happyrobot Inc.

    San Francisco, CA
    2 days ago
  •  ...poised to redefine computing. About the Role We're seeking a Site Reliability Engineer to ensure Hyperbolic's GPU marketplace and AI...  ...automation for capacity management and resource allocation, lead incident response and post-mortem processes, and work closely... 

    Hyperbolic Labs

    San Francisco, CA
    4 days ago
  • Hybrid onsite in Menlo Park, CA. Responsibilities Lead and onboard services and teams to the reliability tenets. Establish and maintain Service Level...  ...or equivalent. 6+ years of experience in Site Reliability Engineering, managing infrastructure and services at scale... 

    OutSystems, Inc.

    San Francisco, CA
    2 days ago
  • $50 per hour

     ...years of professional SRE experience 5+ years of experience contributing to architecture and design (architecture, design patterns, reliability and scaling) of new and current systems Bachelor's Degree in Computer Science or related field, or 8+ years relevant work... 
    Temporary work
    Work experience placement

    Epoch Biodesign

    San Francisco, CA
    1 day ago

Do you want to receive more vacancies?

Subscribe and receive similar vacancies to Lead Site Reliability Engineer. Be the first to apply!