Site Reliability Engineer

Happy Robot

About HappyRobot

HappyRobot is the infrastructure for enterprises to build and orchestrate AI workforces. Our AI workers don't just communicate - they make decisions, take action, and run operations autonomously across voice, email, and enterprise systems. Born in Y Combinator (S23) and backed by a16z and Base10 with over $60M raised, we power critical operations for global enterprises worldwide.

Our platform is battle-tested in the most demanding environments - where AI has real consequences. We started in logistics, built our own voice stack, models, and orchestration layer from the ground up, and are now bringing that infrastructure to every enterprise that runs the real economy. Learn more about our vision in our manifesto.

About the Role

We're looking for a Site Reliability Engineer to take the lead on scaling our operational resilience as we grow. You'll own the stability, observability, and debugging workflows that keep our systems running smoothly. You'll be the go-to person for untangling complex failures in real time, designing tools that turn chaos into clarity, and helping us shift from reactive to proactive operations.

This is a high-impact, high-trust role where you'll shape how reliability is done - reducing incident load, building internal tooling, and directly improving developer focus and system uptime. If you love getting to the root of hard problems and making systems (and teams) stronger, this is your moment.

Must-Have

3+ years of hands-on experience debugging production systems (logs, traces, incidents, etc.)
Strong problem-solving skills and ability to dive into unfamiliar backend codebases
Strong Go and Kubernetes experience.
Familiarity with observability and monitoring tools (e.g., Grafana, Prometheus, Sentry)
Clear, calm communication under pressure - especially during live incidents

Nice-to-Have

Experience working with distributed systems or services at scale
Built or maintained internal tooling for on-call teams or reliability workflows
Familiarity with deployment pipelines, CI/CD, or infra-as-code
Experience improving system observability (e.g., custom metrics, traces, log pipelines)

Why join us?

Opportunity to work at a high-growth AI startup , backed by top investors.
Fast Growth - Backed by a16z and YC , on track for double-digit ARR .
Top-Tier Compensation - Competitive salary + equity in a high-growth startup.
Ownership & Autonomy - Take full ownership of projects and ship fast.
Work With the Best - Join a world-class team of engineers and builders.

Our Operating Principles

Extreme Ownership

We take full responsibility for our work, outcomes, and team success. No excuses, no blame-shifting - if something needs fixing, we own it and make it better. This means stepping up, even when it's not "your job." If a ball is dropped, we pick it up. If a customer is unhappy, we fix it. If a process is broken, we redesign it. We don't wait for someone else to solve it - we lead with accountability and expect the same from those around us.

Craftsmanship

Putting care and intention into every task, striving for excellence, and taking deep ownership of the quality and outcome of your work. Craftsmanship means never settling for "just fine." We sweat the details because details compound. Whether it's a product feature, an internal doc, or a sales call - we treat it as a reflection of our standards. We aim to deliver jaw-dropping customer experiences by being curious, meticulous, and proud of what we build - even when nobody's watching.

We are "majos"
Be friendly & have fun with your coworkers. Always be genuine & honest, but kind. "Majo" is our way of saying: be a good human. Be approachable, helpful, and warm. We're building something ambitious, and it's easier (and more fun) when we enjoy the ride together. We give feedback with kindness, challenge each other with respect, and celebrate wins together without ego.

Urgency with Focus
Create the highest impact in the shortest amount of time. Move fast, but in the right direction. We operate with speed because time is our most limited resource. But speed without focus is chaos. We prioritize ruthlessly, act decisively, and stay aligned. We aim for high leverage: the biggest results from the simplest, smartest actions. We're running a high-speed marathon - not a sprint with no strategy.

Talent Density and Meritocracy
Hire only people who can raise the average; 'exceptional performance is the passing grade.' Ability trumps seniority. We believe the best teams are built on talent density - every hire should raise the bar. We reward contribution, not titles or tenure. We give ownership to those who earn it, and we all hold each other to a high standard. A-players want to work with other A-players - that's how we win.

First-Principles Thinking
Strip a problem to physics-level facts, ignore industry dogma, rebuild the solution from scratch. We don't copy-paste solutions. We go back to basics, ask why things are the way they are, and rebuild from the ground up if needed. This mindset pushes us to innovate, challenge stale assumptions, and move faster than incumbents. It's how we build what others think is impossible.

The personal data provided in your application and during the selection process will be processed by Happyrobot, Inc., acting as Data Controller.

By sending us your CV, you consent to the processing of your personal data for the purpose of evaluating and selecting you as a candidate for the position. Your personal data will be treated confidentially and will only be used for the recruitment process of the selected job offer.

In relation to the period of conservation of your personal data, these will be eliminated after three months of inactivity in compliance with the GDPR and legislation on the protection of personal data.

If you wish to exercise your rights of access, rectification, deletion, portability or opposition in relation to your personal data, you can do so through View email address on click.appcast.io subject to the GDPR.

For more information, visit

By submitting your request, you confirm that you have read and understood this clause and that you agree to the processing of your personal data as described.

Apply

Vacancy posted 3 days ago

Similar jobs that could be interesting for youBased on the Site Reliability Engineer in San Francisco, CA vacancy

Senior Site Reliability Engineer
...Senior Engineering Role at Salesforce Salesforce is the #1 AI CRM, where humans with agents drive customer success together. Here... ...Salesforce is seeking a senior engineering candidate to join the Site Reliability organization in San Francisco. Working closely with...
Suggested
Worldwide
Weekend work
Salesforce
San Francisco, CA
1 day ago
Site Reliability Engineer
...Open Source LLM Gateway Engineer LiteLLM is an open-source LLM Gateway with 34K+ stars on GitHub and trusted by companies like NASA... ...expanding and seeking our 6th Engineer focused on owning reliability, performance, and infrastructure stability for the LiteLLM proxy...
Suggested
BerriAI
San Francisco, CA
2 days ago
Senior Site Reliability Engineer
...of-the-art AI. As a Senior SRE, you'll tackle the scaling and reliability challenges that come with adding terabytes of data monthly and... ...Build out distributed tracing, metrics, and alerting that give engineers clear visibility into system behavior and accelerate debugging...
Suggested
Unify
San Francisco, CA
3 days ago
Senior Site Reliability Engineer
...come shape the future and be part of a truly unique global culture at OutSystems! Hybrid Onsite in Menlo Park, CA Site Reliability Engineering (SRE) is a discipline that incorporates aspects of software engineering and applies them to infrastructure and...
Suggested
Immediate start
Remote work
Worldwide
OutSystems
San Francisco, CA
3 days ago
Site Reliability Engineer
$125k - $165k
...Site Reliability Engineer TELCOR Inc, a leading innovator in laboratory software, is looking for a Site Reliability Engineer to join our TELCOR AI Systems team! Do you have strong experience in cloud infrastructure, distributed systems and production operations? Do...
Suggested
Work at office
Remote work
TELCOR
San Francisco, CA
3 days ago
Senior Site Reliability Engineer
...Site Reliability Engineer (SRE) We're looking for an experienced Site Reliability Engineer (SRE) to help us scale our platform with reliability, observability, and operational excellence at the core. You'll partner with engineers and data scientists to build, automate...
Alembic Technologies
San Francisco, CA
1 day ago
Senior Site Reliability Engineer
$181.69k - $213.75k
...Senior Site Reliability Engineer San Francisco, California; Santa Clara, California; Seattle, WA The Company You'll Join Carta connects founders, investors, and limited partners through world-class software, purpose-built for everyone in venture capital, private...
Full time
Work at office
Carta
San Francisco, CA
1 day ago
Site Reliability Engineer
...The role We're looking for a world-class Site Reliability Engineer to ensure the reliability, performance, and scalability of our AI infrastructure platform. You'll be building and operating the core systems that power agentic AI at scale. Your mission: keep...
Blaxel, Inc
San Francisco, CA
4 days ago
Site Reliability Engineer
...$10 billion. We work in-person five days a week in our San Francisco, NYC, or London offices. About the Role As a Site Reliability Engineer (SRE) at Mercor, you'll own production reliability across our most critical systems, partnering directly with infrastructure...
Work at office
Relocation package
Mercor Alabaster
San Francisco, CA
1 day ago
Site Reliability Engineer
...Site Reliability Engineer Job Location: San Francisco, CA or Charlotte, NC. Job Type: Contract Work with local API development squads, platform teams, product owners, scrum masters, and architects. The SRE ensures that both our internally critical and our externally...
Contract work
Local area
InterSources
San Francisco, CA
1 day ago
Senior Site Reliability Engineer
$159.2k - $301.6k
...running Graphs on the cloud. In this reliability-focused role, you will own the availability... .... You'll partner with the backend engineers building these APIs to make sure the system... ...Science. ~5-10 years of experience in site reliability engineering, infrastructure,...
Temporary work
Local area
Worldwide
Adobe
San Francisco, CA
4 days ago
Site Reliability Engineer
$260k - $300k
...agents. We're the makers of Devin, the first AI software engineer, and Windsurf, the AI-native IDE. Together, they represent our... ...faster than anyone expects. You will own both the production reliability of our user-facing products and the platform engineering that...
Cognition Corp
San Francisco, CA
1 day ago
Site Reliability Engineer
$230k - $310k
...millions of daily users while enabling our engineering teams to ship fast. You'll own the... ...building automation and tooling that improves reliability and partnering with engineering to... ...What You'll Bring ~5+ years in site reliability engineering, DevOps, or systems...
Full time
Work at office
Work from home
Gamma
San Francisco, CA
10 hours ago
Site Reliability Engineer
...an SRE to join our infrastructure team. This role will be responsible for building software to ensure the reliability of our back-end systems, working with engineers who develop them, and planning for our future growth. You will work with our existing production...
Worldwide
Home office
Flexible hours
Superhuman
San Francisco, CA
10 hours ago
Site Reliability Engineer (SRE)
...globe. Join us on this journey to redefine resource management-and change lives along the way. The Role As a Site Reliability Engineer (SRE) at Air Apps, you will be responsible for ensuring the reliability, availability, and scalability of our systems. You...
Temporary work
Worldwide
Air Apps
San Francisco, CA
3 days ago
Site Reliability Engineer
...Site Reliability Engineer Specter's mission is to help automate the physical world. Today, we build video sensors with state-of-the-art AI agents that answer any question, anywhere in their environments. Our systems can automatically detect and reason about any physical...
Remote work
Specter Services LLC
San Francisco, CA
3 days ago
Senior Site Reliability Engineer
Secure Every Identity, from AI to Human Identity is the key to unlocking the potential of AI. Okta secures AI by building the trusted, neutral infrastructure that enables organizations to safely embrace this new era. This work requires a relentless drive to solve complex...
Flexible hours
Okta, Inc.
San Francisco, CA
2 days ago
Senior Site Reliability Engineer
$166.9k - $225.9k
...Summary: Drata's SRE team operates as both a central engineering function and an embedded reliability practice. You'll be part of a close-knit SRE team... ...What you'll bring: ~6+ years of experience in Site Reliability Engineering, Cloud Engineering, or building...
Work at office
Immediate start
Worldwide
Monday to Friday
Flexible hours
Drata Inc
San Francisco, CA
3 days ago
Site Reliability Engineer
...JOB DESCRIPTION Project Outline: We are looking for a Site Reliability Engineer with experience in incident response. In this role, you will help Shipt understand where we can improve stability and reliability. There will be a focus on the intersection of systems...
BayOne Solutions
San Francisco, CA
3 days ago
Senior Site Reliability Engineer
$195k - $240k
...Senior Site Reliability Engineer San Francisco (Hybrid) At You.com, we are building the AI Search Infrastructure that powers modern AI systems. Our goal is to create the trusted knowledge layer that agents, applications, and enterprises rely on to retrieve real-time...
Full time
Immediate start
Remote work
Work from home
Flexible hours
Y.O.U.
San Francisco, CA
1 day ago
Senior Site Reliability Engineer
$117k - $209.33k
...Job Requisition ID # 26WD99273 Position Overview Want to help make a better world? As a Senior Site Reliability Engineer at Autodesk, you can help us build and operate reliable, secure, and scalable cloud services for Autodesk GovCloud products. As part of a new...
For contractors
Autodesk
San Francisco, CA
2 days ago
Site Reliability Engineer
$150k
Site Reliability Engineer (SRE) We are seeking an experienced Site Reliability Engineer (SRE) with a strong focus on DevSecOps to join our growing engineering team. In this role, you will oversee and maintain the reliability, security posture, and operational hygiene of...
VantageScore
San Francisco, CA
2 days ago
Site Reliability Engineer
$350k
...work for their unique needs and goals. We are scientists, engineers, and builders who’ve created some of the most widely used AI... ...alongside the Tinker community. About the Role We're looking for a Site Reliability Engineer to drive the reliability of Tinker end-to-end. You'...
Visa sponsorship
Work visa
Relocation package
Thinking Machines Lab
San Francisco, CA
2 days ago
Sr. Site Reliability Engineer - Paze
$106k - $130k
...sponsorship. Overall Purpose To create and maintain the next generation of application infrastructure and to be responsible for reliability, automation and scalability using the latest best practices. Essential Functions Implement software and tools to improve performance...
Hourly pay
Work experience placement
Work at office
Immediate start
Visa sponsorship
Work visa
Flexible hours
Early Warning Services LLC
San Francisco, CA
2 days ago
Senior Site Reliability Engineer, Wikimedia Enterprise
$15 per hour
# Senior Site Reliability Engineer, Wikimedia EnterpriseWikimedia FoundationAI Ethics & Tech for GoodEducation Access & Learning EquityOperations, Finance & HRLocationRemoteWork ModeRemoteFound17 hours agoExperienceSenior---For the full description, pleasevisit the official...
Permanent employment
For contractors
Currently hiring
Local area
Remote work
Social Impact Guide
San Francisco, CA
2 days ago
Senior Site Reliability Engineer
$175k - $250k
...0/yr Job Title: Senior Cloud Infrastructure Engineer Location: San Francisco, CA. Remote unavailable. Modality: On-Site only. Must live within commuting distance of... ...while ensuring scalability, performance, and reliability across environments. What You’ll Do Design,...
Full time
Remote work
Relocation
Relocation package
The Recruiting Guy
San Francisco, CA
3 days ago
Site Reliability Engineer
...workloads. We are builders, architects, engineers, and researchers with hands-on experience... ...are deploying five initial production sites, with the first one coming online in July... ...Role We're looking for an experienced Site Reliability Engineer to build and operate the...
Work at office
AI Fabrik
San Francisco, CA
2 days ago
Site Reliability Engineer II
$98.58k - $138.02k
...This role requires a hybrid work schedule based out of one of our office locations: Austin, TX; Irvine, CA; or Akron, OH. Site Reliability Engineer II will be responsible for supporting, enhancing, and maintaining Restaurant365’s cloud infrastructure and applications....
Work at office
Restaurant365
San Francisco, CA
2 days ago
Senior Site Reliability Engineer
Location San Francisco, CA Employment Type Full time Department Engineering Who We Are Hyperbolic Labs is on a mission to democratize AI... ...to redefine computing. About the Role We\'re seeking a Site Reliability Engineer to ensure Hyperbolic\'s GPU marketplace and AI infrastructure...
Full time
Hyperbolic
San Francisco, CA
2 days ago
Senior Site Reliability Engineer
.... It's designed so Stellar's ecosystem can make a real-world, lasting impact. About the Role SDF is looking for a Senior Site Reliability Engineer to help build and operate the foundation that powers our engineering teams. You'll ensure the reliability and scalability...
TechChain Talent
San Francisco, CA
2 days ago

Do you want to receive more vacancies?

Subscribe and receive similar vacancies to Site Reliability Engineer. Be the first to apply!