Lead Site Reliability Engineer

$200k - $275k

Stuut

Job Description

Stuut is transforming accounts receivable for B2B companies—making collections smarter and faster for companies that have historically relied on manual processes that are labor intensive and costly. Our platform is gaining traction with finance teams across industrials, chemicals, and manufacturing sectors from Fortune 10 brands to scaling midmarkets. We're backed by top-tier investors including a16z, Khosla, Activant, 1984 Ventures and Page One.

The Role

We’re hiring a Lead Site Reliability Engineer to drive the strategy, architecture, and execution of reliability, scalability, and operational excellence across our platform. You’ll build and scale the systems that keep Stuut highly available, performant, and resilient as we grow customers, traffic, and complexity.

From defining SLOs and reliability standards to hardening infrastructure, improving observability, and guiding teams through incident response and postmortems, you’ll own the engineering rigor that allows us to ship quickly without sacrificing stability . You’ll turn strong reliability engineering into real customer trust — creating the guardrails that let product and engineering move fast with confidence.

This is a hands-on technical leadership role for an engineer who excels at designing reliable distributed systems, influencing engineering practices, and leading high-impact reliability initiatives across teams.

What You’ll Do

Set the Reliability Strategy: define the long-term vision for site reliability, including SLOs/SLIs, error budgets, availability targets, and operational standards.
Build & Scale Reliable Infrastructure: architect and maintain resilient, scalable cloud infrastructure across AWS and Kubernetes, ensuring systems are secure, fault-tolerant, and cost-effective.
Own Observability & Monitoring: design and evolve monitoring, alerting, and logging systems that provide clear, actionable signals across services and environments.
Lead Incident Response & Postmortems: own incident management practices, lead major incident response, and drive blameless postmortems that result in meaningful system improvements.
Improve System Resilience: identify reliability risks and lead efforts around redundancy, failover, capacity planning, and graceful degradation.
Optimize CI/CD & Deployment Reliability: partner with engineering teams to ensure deployments are safe, observable, and reversible; improve rollout strategies and reduce operational risk.
Partner with Product & Engineering Teams: collaborate early in the development lifecycle to influence system design, scalability, and reliability tradeoffs.
Reduce Toil & Improve Developer Experience: automate operational tasks, improve runbooks, and build tooling that reduces manual work and accelerates safe execution.
Drive Root Cause Resolution : guide teams through deep debugging of reliability issues, ensuring fixes address underlying causes rather than symptoms.
Influence Reliability Culture: promote reliability-first thinking, strong operational hygiene, and shared ownership of production systems across engineering.
Mentor & Level Up the Team: coach engineers on reliability principles, incident handling, infrastructure design, and operational best practices.

You Might Be a Fit If You…

Have 7+ years of experience in site reliability engineering, infrastructure engineering, or backend software engineering.
Have designed and operated highly available, production-grade systems supporting rapid product iteration.
Are fluent in Python and/or TypeScript, and comfortable building automation and tooling to support reliability goals.
Have a deep experience with AWS, Kubernetes (EKS), Docker, and cloud-native architectures.
Have implemented and evolved observability stacks (metrics, logs, traces) and know how to create high-signal alerting.
Understand how to design, measure, and enforce SLOs, SLIs, and error budgets.
Have supported systems built with modern stacks such as FastAPI, Vue.js, PostgreSQL (RDS), and event-driven architectures.
Have improved reliability and operational maturity in environments using CI/CD pipelines, infrastructure as code, and modern deployment workflows.
Can balance reliability, velocity, and cost — making pragmatic tradeoffs that serve customers and the business.
Enjoy collaborating across Product, Backend, Frontend, and Infrastructure teams to improve system health.
Thrive in a role that blends deep technical execution, system design, and leadership influence in a fast-moving environment.

Compensation

Top-of-market salary and equity package
Benefits (for U.S.-based full-time employees)
Medical, dental & vision insurance coverage for you
401(k) & Match
Equity
Flexible PTO
Parental Leave

Compensation Range: $200K - $275K

Apply

Vacancy posted a month ago

Similar jobs that could be interesting for youBased on the Lead Site Reliability Engineer in San Francisco, CA vacancy

Senior Site Reliability Engineer
...US Corp. is seeking a Lead Site Reliability Engineer to spearhead our mission of delivering highly available and performant systems. With an average of over 12 years of industry experience, the successful candidate will bridge the gap between software development and...
Suggested
Axiom Pursuits
San Francisco, CA
3 days ago
Senior Site Reliability Engineer
...role, we encourage you to apply. The Role As a Senior Platform Engineer, you are a champion for DevOps and SRE culture and industry... ...company goals are met. What You Will Be Doing Improving production reliability and system resilience within an SRE scoped team Championing...
Suggested
Flexible hours
Megaport
Brisbane, CA
5 days ago
Senior Site Reliability Engineer
$166.9k - $225.9k
...SRE team operates as both a central engineering function and an embedded reliability practice. You'll be part of a close... ...before they become incidents. Lead Production Readiness Reviews (PRRs)... ...ll bring 6+ years of experience in Site Reliability Engineering, Cloud Engineering...
Suggested
Flexible hours
Drata
San Francisco, CA
2 days ago
Senior Staff Site Reliability Engineer
$181k - $263k
...line operational support. We are looking for a Senior Staff Site Reliability Engineer who will set the technical direction for reliability... ...last resort for high-impact production incidents globally, leading postmortems with org-wide action itemsEstablish and enforce...
Suggested
Work from home
Flexible hours
Night shift
LiveRamp
San Francisco, CA
2 days ago
Senior Staff Site Reliability Engineer
$220k - $235k
...Ironclad is the leading AI contracting platform that transforms agreements into assets... ...of our cloud platform and champion engineering excellence across Ironclad. In this role... ...and strategic direction for the Site Reliability Engineering team and our broader Cloud...
Suggested
Full time
Work at office
Ironclad Inc
San Francisco, CA
3 days ago
Site Reliability Engineer - Scale & Observability
A dynamic tech firm located in San Francisco is seeking a Site Reliability Engineer to enhance operational health across their production systems... .... You will manage production systems' reliability and lead incident response efforts to prevent issues, all while contributing...
gamma.app
San Francisco, CA
4 days ago
Remote Senior Site Reliability Engineer (SRE) - Zetachain
We are seeking a Sr. Site Reliability Engineer to join our team and run critical infrastructure for our blockchain and web applications. You’ll... ...Impactful Role: Play a pivotal role in shaping the future of a leading blockchain protocol. Remote Flexibility: Enjoy the freedom...
Remote job
Blockchain Works
San Francisco, CA
a month ago
Site Reliability Engineer 1
$75.2k - $95.3k
About the Team & Role We are looking for a highly motivated and high-potential entry-level Site Reliability Engineer (SRE) to join our team and help drive meaningful business impact while launching your career in reliability engineering. This is a really exciting time to...
Full time
Work experience placement
Flexible hours
WEX
San Francisco, CA
5 days ago
Manager, Site Reliability Engineering
$204k - $281k
...on this mission. If you are too, let's talk. MANAGER, SITE RELIABILITY ENGINEERING San Francisco, California Secure Every Identity, from AI... ...service capabilities, and robust self-healing patterns. * Lead, mentor, and grow a high-performing team of engineers and...
Permanent employment
Full time
Work at office
Local area
Worldwide
Flexible hours
2 days per week
Okta
San Francisco, CA
18 hours ago
Staff Site Reliability Engineer - Kubernetes
$194k - $267k
...something more than once, automate it” and who can rapidly self-educate on new concepts and tools. POSITION OVERVIEW: The Site Reliability Engineer (SRE) will play a key role in building and managing Kubernetes platforms that support cloud-native applications and...
Permanent employment
Full time
Work at office
Local area
Worldwide
Flexible hours
Okta
San Francisco, CA
18 hours ago
Senior Site Reliability Engineer
$166.59k - $199.91k
...About the Role The company is looking for a high-performance engineer to be a part of a team of Site Reliability Engineers. You will be working closely with engineering teams, product managers, as well as support and sales engineers to build the future of the company’s...
Work experience placement
United States Digital Space LLC
Oakland, CA
5 days ago
Partnerships, System Integrators Lead
$220k - $295k
About Decagon About Decagon is the leading conversational AI platform empowering every brand to deliver concierge customer experiences... ...Collaborate cross-functionally with Legal, Sales, Solutions Engineering, Implementations (Post Sales), Engineering, and Marketing to ensure...
Work at office
Decagon
San Francisco, CA
1 day ago
Developer Relations Lead: API Platform & Community
United States Digital Space LLC is seeking a Developer Relations professional to lead community engagement and represent our API platform at conferences. You'll manage our presence across various channels and ensure developers have the resources they need for success....
United States Digital Space LLC
San Francisco, CA
2 days ago
Senior Cluster Site Reliability Engineer
...we have ambitious goals for the future. As a Senior Cluster Site Reliability Engineer (SRE), you will help scale our research compute cluster to... ...DevOps roles, preferably working as a senior engineer or tech lead. Knowledge of HPC/batch compute frameworks (Slurm, Kueue,...
Local area
The Voleon Group
Berkeley, CA
2 days ago
Senior Manager, Site Reliability Engineering - Infrastructure Platform
$232k - $319k
...scale the service with great people and reliable, cost-effective, and efficient infrastructure... ...& tooling. What you’ll be doing * Lead the Infra platform and shared services... ...the velocity of SRE and product engineering by developing robust platforms, powerful...
Permanent employment
Full time
Local area
Worldwide
Flexible hours
Okta
San Francisco, CA
18 hours ago
Senior Platform Engineer: AWS Backend & Infra Lead
Compa is seeking a Senior Software Engineer for its Platform Team in San Francisco. You will lead backend and infrastructure projects, ensuring reliable platform operations. Ideal candidates will have over 5 years of experience with AWS, Python, and Django, driving backend...
Compa
San Francisco, CA
5 days ago
Senior Platform Engineer: Architect, Lead & Mentor (Remote)
$150k - $200k
Slateedutech is seeking a Lead Software Developer to drive the development of our core educational platform in San Francisco. In this role, you will architect scalable solutions and mentor junior developers while building impactful features for millions of students. We...
Remote job
Flexible hours
Slateedutech
San Francisco, CA
2 days ago
Senior Backend Engineer - Lending Platform Lead
Flourish Ventures is seeking a Senior Software Engineer (L5) to enhance its Lending products and platforms which provide fair credit access. The engineer will integrate member-facing features with robust backend systems, influencing technical direction across teams. This...
Flourish Ventures
San Francisco, CA
3 days ago
AI-Driven Developer Productivity Lead
$385k
...LLC in San Francisco is looking for a Product Manager specializing in Developer Productivity. This role involves partnering with engineering teams to enhance the developer experience and improve engineering workflows. The ideal candidate will have over 7 years of experience...
United States Digital Space LLC
San Francisco, CA
2 days ago
Marketing Data Platform Engineer & AI Analytics Lead
A global payments technology leader is seeking a Hybrid Data Specialist to enhance Visa Marketing 360, focused on data engineering and analytics. You will build and optimize a scalable data foundation, integrate multiple data sources, and develop dashboards for marketing...
Visa
San Francisco, CA
1 day ago
SUPERVISOR, BACK OF HOUSE LEAD (FULL TIME)
$29 per hour
...Supervisor, Back of House Lead We are hiring immediately for full time SUPERVISOR, BACK OF HOUSE LEAD positions. Location: Delta SFO - Terminal One, San Francisco, CA 94128. Note: online applications accepted only. Schedule: Full time schedule. Open availability...
Hourly pay
Weekly pay
Full time
Part time
Local area
Immediate start
Remote work
Worldwide
Flexible hours
Weekend work
Compass Group
San Francisco, CA
5 days ago
Mammography Imaging Supervisor | Lead Radiology Team
UCSF Health in San Francisco seeks an experienced Radiology Supervisor for Mammography to lead daily operations, supervise technologists, and ensure high image quality and safety. The role requires current mammography certification, strong leadership, and compliance with...
UCSF Health
San Francisco, CA
4 days ago
Cafe Lead Chef - Day - time
$75k
...Chef @ the Legion Cafe located inside the Legion of Honor Museum at Lincoln Park, San Francisco. We are seeking a passionate, skilled Lead-chef to create delicious meals for our patrons. You will be responsible for planning our menu, ensuring that each dish is nutritious...
Full time
All shifts
Shift work
Afternoon shift
Early shift
McCalls Catering & Events
San Francisco, CA
2 days ago
Imaging & Optics Systems Engineer Lead Camera Architecture
Blackstone Technology Group in San Francisco is seeking a Staff Systems Engineer with expertise in imaging and optics. You will lead the design and integration of complex systems crucial for smart city applications. This role involves collaboration with diverse teams and...
Work at office
3 days per week
Blackstone Technology Group
San Francisco, CA
1 day ago
Developer API Growth & Narrative Lead
Perplexity is seeking a Product Marketing Manager, API, to bridge their product offerings and market impact. The ideal candidate will have over 7 years of experience in product marketing, particularly within high-growth tech companies, and will focus on engaging developers...
Perplexity
San Francisco, CA
5 days ago
Site Reliability Engineer
$160k - $230k
...As a Site Reliability Engineer (SRE) at Together, you are responsible for keeping all user-facing services and production systems running smoothly... ..., hardware, algorithms, and models. We have contributed to leading open-source research, models, and datasets to advance the...
Remote job
Full time
Work experience placement
Together AI
San Francisco, CA
more than 2 months ago
Executive Housekeeper — Lead a Premier Hotel Cleaning Team
$90k - $100k
A leading hotel management firm in San Francisco seeks an Executive Housekeeper to oversee housekeeping operations at Harbor Court Hotel. Key responsibilities include supervising staff, implementing cleaning standards, and ensuring a welcoming environment. The ideal candidate...
Full time
Hotel Equities Group
San Francisco, CA
5 days ago
Software Engineer, Identity & Enterprise Platform
...largest customers. Our enterprise customers have complex environments, strict security requirements, and high expectations for reliability. You’ll ensure Sierra’s platform scales across organizations, regions, and use cases while maintaining performance and simplicity...
Full time
Flexible hours
Sierra
San Francisco, CA
16 days ago
Site Reliability Engineer
$100k - $200k
...Instacart, NFI, Ramp, and Zscaler. We’re building the most reliable and secure identity platform in the world. To do that, we... ..., automates, and recovers without skipping a beat. As a Site Reliability Engineer, you’ll help us design, run, and improve the systems that...
Remote work
Flexible hours
ConductorOne
San Francisco, CA
more than 2 months ago
Site Reliability Engineer
$150k
...Description About The Role We are seeking an experienced Site Reliability Engineer (SRE) with a strong focus on DevSecOps to join our growing... ...tooling (e.g., CloudWatch, Datadog, Grafana). ~ Lead periodic infrastructure and dependency audits; produce clear...
VantageScore
San Francisco, CA
29 days ago

Do you want to receive more vacancies?

Subscribe and receive similar vacancies to Lead Site Reliability Engineer. Be the first to apply!