Lead Site Reliability Engineer
$200k - $275kStuut
Job Description
Job Description
Stuut is transforming accounts receivable for B2B companies—making collections smarter and faster for companies that have historically relied on manual processes that are labor intensive and costly. Our platform is gaining traction with finance teams across industrials, chemicals, and manufacturing sectors from Fortune 10 brands to scaling midmarkets. We're backed by top-tier investors including a16z, Khosla, Activant, 1984 Ventures and Page One.
The Role
We’re hiring a Lead Site Reliability Engineer to drive the strategy, architecture, and execution of reliability, scalability, and operational excellence across our platform. You’ll build and scale the systems that keep Stuut highly available, performant, and resilient as we grow customers, traffic, and complexity.
From defining SLOs and reliability standards to hardening infrastructure, improving observability, and guiding teams through incident response and postmortems, you’ll own the engineering rigor that allows us to ship quickly without sacrificing stability . You’ll turn strong reliability engineering into real customer trust — creating the guardrails that let product and engineering move fast with confidence.
This is a hands-on technical leadership role for an engineer who excels at designing reliable distributed systems, influencing engineering practices, and leading high-impact reliability initiatives across teams.
What You’ll Do
Set the Reliability Strategy: define the long-term vision for site reliability, including SLOs/SLIs, error budgets, availability targets, and operational standards.
Build & Scale Reliable Infrastructure: architect and maintain resilient, scalable cloud infrastructure across AWS and Kubernetes, ensuring systems are secure, fault-tolerant, and cost-effective.
Own Observability & Monitoring: design and evolve monitoring, alerting, and logging systems that provide clear, actionable signals across services and environments.
Lead Incident Response & Postmortems: own incident management practices, lead major incident response, and drive blameless postmortems that result in meaningful system improvements.
Improve System Resilience: identify reliability risks and lead efforts around redundancy, failover, capacity planning, and graceful degradation.
Optimize CI/CD & Deployment Reliability: partner with engineering teams to ensure deployments are safe, observable, and reversible; improve rollout strategies and reduce operational risk.
Partner with Product & Engineering Teams: collaborate early in the development lifecycle to influence system design, scalability, and reliability tradeoffs.
Reduce Toil & Improve Developer Experience: automate operational tasks, improve runbooks, and build tooling that reduces manual work and accelerates safe execution.
Drive Root Cause Resolution : guide teams through deep debugging of reliability issues, ensuring fixes address underlying causes rather than symptoms.
Influence Reliability Culture: promote reliability-first thinking, strong operational hygiene, and shared ownership of production systems across engineering.
Mentor & Level Up the Team: coach engineers on reliability principles, incident handling, infrastructure design, and operational best practices.
You Might Be a Fit If You…
Have 7+ years of experience in site reliability engineering, infrastructure engineering, or backend software engineering.
Have designed and operated highly available, production-grade systems supporting rapid product iteration.
Are fluent in Python and/or TypeScript, and comfortable building automation and tooling to support reliability goals.
Have a deep experience with AWS, Kubernetes (EKS), Docker, and cloud-native architectures.
Have implemented and evolved observability stacks (metrics, logs, traces) and know how to create high-signal alerting.
Understand how to design, measure, and enforce SLOs, SLIs, and error budgets.
Have supported systems built with modern stacks such as FastAPI, Vue.js, PostgreSQL (RDS), and event-driven architectures.
Have improved reliability and operational maturity in environments using CI/CD pipelines, infrastructure as code, and modern deployment workflows.
Can balance reliability, velocity, and cost — making pragmatic tradeoffs that serve customers and the business.
Enjoy collaborating across Product, Backend, Frontend, and Infrastructure teams to improve system health.
Thrive in a role that blends deep technical execution, system design, and leadership influence in a fast-moving environment.
Compensation
Top-of-market salary and equity package
Benefits (for U.S.-based full-time employees)
Medical, dental & vision insurance coverage for you
401(k) & Match
Equity
Flexible PTO
Parental Leave
Compensation Range: $200K - $275K
- Airwallex- is seeking a Senior Site Reliability Engineer in San Francisco, California, to work with product teams to build and maintain robust cloud infrastructure. In this role, you will lead critical infrastructure projects, ensuring the reliability and performance of...Suggested
$150k
...Description About The Role We are seeking an experienced Site Reliability Engineer (SRE) with a strong focus on DevSecOps to join our growing... ...tooling (e.g., CloudWatch, Datadog, Grafana). ~ Lead periodic infrastructure and dependency audits; produce clear...Suggested- ...Job Description Job Description Forhyre is looking for engineers who can bring unique perspectives and innovative ideas to all areas... ...evangelize cloud best practices while building a culture of reliability and observability Engage in and improve the end to end lifecycle...Suggested
$163k - $203k
...contributor on the SRE team, responsible for the reliability, scalability, and security of Prosper’s... .... This is as much of a platform engineering role as it is SRE role — you will maintain... ...Participate in on-call rotation and lead incident response Build and maintain...SuggestedWork experience placementWork at officeLocal areaRemote workFlexible hours2 days per week- ...fast-growing, early-stage startup to identify a top-tier Site Reliability Engineer who will play a critical role in scaling and strengthening... ...Why Join Direct impact on a rapidly growing product serving leading enterprise organizations Opportunity to work alongside...Suggested
- Drata is seeking a Senior Site Reliability Engineer in San Francisco. In this role, you will engage in reliability architecture for product teams, lead production readiness reviews, and build automation around monitoring and alerting. The ideal candidate has at least 6...
$175k - $250k
...Senior Cloud Infrastructure Engineer Location: San Francisco,... ...unavailable. Modality: On-Site only. Must live within commuting... ...this role, you will take the lead on designing, deploying, and... ..., performance, and reliability across environments. What...Full timeRemote workRelocationRelocation package- ...startups across the US. We’re building a pool of world-class Site Reliability Engineers for current roles and for upcoming opportunities. You will... ...and local performance testing and track benchmarks. Lead resilience work like failover drills, chaos tests, and redundancy...Local area
- ...daily users while enabling our engineering teams to ship fast. You'll... ...automation and tooling that improves reliability and partnering with... ...prioritize stability. You'll lead incident response, drive systemic... ...you'll bring ~5+ years in Site Reliability Engineering, DevOps...Work at officeWork from home
$210k - $240k
...Join to apply for the Senior Site Reliability Engineer role at Alembic Technologies This range is provided by Alembic Technologies. Your actual pay will be based on your skills and experience — talk with your recruiter to learn more. Base pay range $210,000....Full time$56.25 - $137 per hour
...Join to apply for the Site Reliability Engineer - Inference role at Jobright.ai 2 days ago Be among the first 25 applicants Join to apply for the Site Reliability Engineer - Inference role at Jobright.ai Get AI-powered advice on this job and more exclusive...Full timeSummer workInternshipH1bShift work- ...customer acquisition, and Connor was a machine learning research engineer at Scale AI. The rest of our team comes from companies like... ...of-the-art AI. As a Senior SRE, you'll tackle the scaling and reliability challenges that come with adding terabytes of data monthly and...
- ...A high-growth AI startup in San Francisco is seeking a Site Reliability Engineer to lead the scaling of operational resilience. In this role, you will own system stability and debugging workflows while tackling complex failures and enhancing proactive operations. Ideal...
$80 per hour
...Infrastructure Site Reliability Engineer (Local only) Direct message the job poster from Maxonic Inc. Job Description: Job Title: Infrastructure Site Reliability Engineer Job Type: Contract (4+ months) with strong possibility to convert to fulltime Job...Full timeContract workFor contractorsLocal area2 days per week$227.2k - $324.5k
...About the Role: Site Reliability Engineering (SRE) at Tubi is not a traditional operations team. We are a software engineering organization that... ...seeking an experienced and visionary Senior SRE Manager to lead and grow our newly built Site Reliability Engineering team....Full timeContract workTemporary workLocal areaFlexible hours$210k - $310k
...Director of Site Reliability Engineering Interested in working on cutting-edge blockchain technology and creating equitable access to the global... ...rapidly growing and changing Stellar ecosystem. You will lead an experienced Site Reliability Engineering team, ensuring...Temporary workWork at officeLocal areaRemote workWorldwideFlexible hours- ...cloud-native systems. As a Staff Platform Engineer, you will play a critical role in... ...technical leadership role. You will own reliability for major platform domains, design scalable... ...Infrastructure Development, Platform Engineering, or Site Reliability Engineering role, with a...
$60 per hour
Senior Site Reliability Engineer (Copy) Seattle Hybrid (Hybrid location). Full-time. About Us Supio is a trusted AI platform purpose-built for... ...shipping process. You’ll work closely with engineers, product leads, and company leadership to ensure uptime, speed, and...Full timeWork at officeFlexible hours- # Senior Site Reliability EngineerHybrid - San Francisco**Our Mission & Values:** At Drata, we... ...Diversity brings unique perspectives that lead to better solutions. Automation First... ...'s SRE team operates as both a central engineering function and an embedded reliability practice...Work at officeImmediate startWorldwideMonday to FridayFlexible hours
$175k - $250k
...a fast‑growing customer base of SaaS companies. About the Site Reliability Engineering Team The Site Reliability Engineering (SRE) team ensures the... ...degradation Improve our incident response process, lead post‑mortems, and drive follow‑through on reliability risks...Remote work$165k - $225k
...and the SDF team is expanding to support the rapidly growing and changing Stellar ecosystem. SDF is looking for a Senior Site Reliability Engineer to help build and operate the foundation that powers our engineering teams. You’ll ensure the reliability and scalability...Temporary workWork at officeLocal areaWorldwideFlexible hours- ...Partners, TQ Ventures, Susa/Kivu Ventures, and other leading investors, we’re building the category-defining... ...About the role We’re hiring an SRE to join our engineering team at Plenful and take ownership of the reliability and performance of the systems that power our product...Work at officeRemote workFlexible hours2 days per week
$125k - $165k
Position Site Reliability Engineer Location Lincoln, NE, San Francisco, CA, or Remote Job ID 434 Openings 1 Job Summary The Site Reliability Engineer will help ensure the reliability, scalability, and performance of the systems that power our AI products. This role...Temporary workRemote workVisa sponsorshipWork visaFlexible hours- OutSystems, Inc. is looking for a Site Reliability Engineer to join their team in San Francisco, CA. The ideal candidate will lead the onboarding of services and teams to reliability tenets while establishing SLOs and SLAs. Proficiency in Python and experience with Kubernetes...Flexible hours
$151.5k - $252.5k
.... About The Role We are looking for an experienced Senior Site Reliability Engineer to join the Veeam Data Cloud (VDC) engineering team. You will... ...Experience with implementation and maintenance of leading infrastructure and application monitoring tools (Azure Monitor...Base plus commissionLocal areaWorldwide$163k - $203k
...contributor on the SRE team, responsible for the reliability, scalability, and security of Prosper’s... ...portfolio. This is as much a platform engineering role as it is an SRE role— you will... .... Participate in on‑call rotation and lead incident response. Build and maintain observability...Work experience placementWork at officeRemote workFlexible hours2 days per week- Happyrobot Inc. is looking for an Infrastructure Engineer in San Francisco, California. This role involves leading the stability and observability of systems while debugging complex issues as they arise. Candidates should have over 3 years of experience with production...
- ...poised to redefine computing. About the Role We're seeking a Site Reliability Engineer to ensure Hyperbolic's GPU marketplace and AI... ...automation for capacity management and resource allocation, lead incident response and post-mortem processes, and work closely...
- Hybrid onsite in Menlo Park, CA. Responsibilities Lead and onboard services and teams to the reliability tenets. Establish and maintain Service Level... ...or equivalent. 6+ years of experience in Site Reliability Engineering, managing infrastructure and services at scale...
$50 per hour
...years of professional SRE experience 5+ years of experience contributing to architecture and design (architecture, design patterns, reliability and scaling) of new and current systems Bachelor's Degree in Computer Science or related field, or 8+ years relevant work...Temporary workWork experience placement
Do you want to receive more vacancies?
Subscribe and receive similar vacancies to Lead Site Reliability Engineer. Be the first to apply!
- lead engineer San Francisco, CA
- lead security engineer San Francisco, CA
- lead product engineer San Francisco, CA
- lead algorithm engineer San Francisco, CA
- lead network engineer San Francisco, CA
- lead infrastructure engineer San Francisco, CA
- lead backend developer San Francisco, CA
- lead web developer San Francisco, CA
- lead operating engineer San Francisco, CA
- lead system engineer San Francisco, CA


