Senior Site Reliability Engineer

CodeRabbit

About CodeRabbit

CodeRabbit is an innovative research and development company focused on building extraordinarily productive human-machine collaboration systems. Our primary goal is to create the next generation of Gen AI-driven code reviewers: a symbiotic partnership between humans and advanced algorithms that significantly outperforms individual engineers. We combine language models with human ingenuity to push the boundaries of software development efficiency and quality.

The Role:

We are seeking an experienced Site Reliability Engineer to join our Platform Engineering team in the Bay Area. You'll be instrumental in ensuring the high availability, performance, and scalability of CodeRabbit's AI-powered code review platform. This role sits at the intersection of software engineering and systems operations, where you'll build the foundational platforms and automation that enable our engineering teams to deploy, monitor, and scale our services reliably.

As an SRE at CodeRabbit, you'll be responsible for enhancing the reliability of our critical services that process millions of code reviews, building sophisticated automation platforms, and owning the infrastructure that powers our AI-driven analysis engine. You'll work with cutting-edge technologies including large language models, real-time processing systems, and distributed architectures that operate at significant scale.

Key Responsibilities:

Infrastructure & Platform Ownership

Design, implement, and maintain scalable infrastructure on Google Cloud Platform to support CodeRabbit's growing user base and processing demands
Own and operate critical platform services
Build and maintain Infrastructure as Code using Terraform to ensure consistent, reproducible, and version-controlled infrastructure deployments

Reliability & Performance Engineering

Establish and maintain SLI/SLO frameworks for all critical services, ensuring we meet our reliability commitments to users
Implement comprehensive monitoring, alerting, and observability solutions using Datadog and custom instrumentation
Conduct thorough incident response, root cause analysis, and post-mortem processes to continuously improve system reliability
Optimize application and infrastructure performance to handle millions of pull request analyses with minimal latency
Design and implement chaos engineering practices to proactively identify and resolve system weaknesses

Automation & Developer Experience

Develop self-service platforms and tooling that empower engineering teams to deploy, monitor, and troubleshoot their services independently
Automate operational tasks including scaling, backup/recovery, security patching, and routine maintenance
Create and maintain infrastructure APIs and abstractions that simplify complex operations for development teams]

Security & Compliance

Integrate security best practices into all infrastructure and platform services
Implement and maintain security monitoring, vulnerability scanning, and compliance reporting
Design secure network architectures including VPC configuration, firewall rules, and access control systems
Establish and maintain disaster recovery procedures and business continuity planning

Required Qualifications:

7+years of hands-on experience in Site Reliability Engineering, Platform Engineering, or DevOps Engineering roles
Proven track record of managing production systems at scale, preferably in high-growth technology companies
Experience with cloud platforms, particularly AWS or Google Cloud Platform (GCP), including compute, storage, networking, and managed services
Strong background in containerization and orchestration platforms (Kubernetes, Docker)

Technical Skills

Programming Languages : Proficiency in Node.js and TypeScript for building automation tools, monitoring solutions, and platform services
Infrastructure as Code : Advanced experience with Terraform for infrastructure provisioning and management
Monitoring & Observability : Hands-on experience with Datadog or similar platforms (Prometheus, Grafana, ELK stack) for observability
Cloud Platforms : Comprehensive experience with GCP services including Compute Engine, GKE, Cloud Run, Cloud SQL, Cloud Storage, Load Balancing, and IAM

Systems & Operations

Strong Linux/Unix systems skills
Experience with network protocols, load balancing, and CDN technologies
Knowledge of security principles and best practices for cloud infrastructure
Familiarity with CI/CD tools and practices (Jenkins, GitLab CI, GitHub Actions)
Understanding of microservices architecture and distributed systems principles

Bonus Points:

Experience with AI/ML infrastructure and tools
Background in managing high-traffic web applications and API services
Experience with disaster recovery planning and execution
Familiarity with compliance frameworks (SOC 2, ISO 27001)
Contributions to open-source infrastructure or SRE tooling projects
Experience with cost optimization and FinOps practices
Knowledge of performance testing and capacity planning methodologies

Why Join Our Engineering Culture?

CodeRabbit is building the next generation of AI-native developer tooling - starting with code review. We combine large language models with deep software engineering context to help teams ship faster, catch more bugs, and make better architectural decisions at scale.
We are a high-ownership engineering culture. That means no passive execution, no waiting for perfect tickets, and no narrowly defined task boundaries. Engineers here find problems before they're assigned, use AI as a core part of how they build, ship with judgment, and own outcomes from proposal to production.
Our operating philosophy : bias toward action, ship the smallest necessary coherent slice, validate proportional to risk, watch what happens, and make the system better. AI drafts; humans decide. Speed matters, but so does understanding what you ship.
This opportunity will be energizing for people who want real ownership, pace, and high standards . It's uncomfortable for people who prefer slow consensus or heavily managed workflows.
If you want to build tools that are changing how software gets written, and be held to the standard that the best engineers thrive under; we'd love to talk.

Our Values

Collaborative Humans : Prioritizing collective intelligence
Fearless Innovators : Turning obstacles into growth opportunities
Persistent, Passionate Developers : Thriving on complex, long-term challenges
Impact-Driven Creators : Crafting intuitive tools for developers
Rapid Learners and Un-learners : Adapting quickly in our fast-paced technological world

Apply

Vacancy posted 5 days ago

Similar jobs that could be interesting for youBased on the Senior Site Reliability Engineer in San Francisco, CA vacancy

Senior Site Reliability Engineer
...US Corp. is seeking a Lead Site Reliability Engineer to spearhead our mission of delivering highly available and performant systems. With an average of over 12 years of industry experience, the successful candidate will bridge the gap between software development and systems...
Senior
Axiom Pursuits
San Francisco, CA
1 day ago
Senior Site Reliability Engineer
...founders with PhDs in AI, Math, and Computer Science - is poised to redefine computing. About the Role We're seeking a Site Reliability Engineer to ensure Hyperbolic's GPU marketplace and AI infrastructure operate with exceptional reliability, performance, and...
Senior
Hyperbolic Labs
San Francisco, CA
3 days ago
Senior Site Reliability Engineer
$140k - $220k
...About the Job You’ll own reliability and operational excellence for Pylon's production systems. This means designing and implementing... ...scale as we grow. You'll build tooling that makes the entire engineering team more effective, establish on-call rotations and runbooks...
Senior
Pylon
San Francisco, CA
5 days ago
Senior Site Reliability Engineer
...About the job Senior Site Reliability Engineer About the Company Stellar is a decentralized, public blockchain that gives developers the tools to create experiences that are more like cash than crypto. The network is faster, cheaper, and far more energy-efficient...
Senior
TechChain Talent
San Francisco, CA
3 days ago
Senior Site Reliability Engineer
...acquisition, and Connor was a machine learning research engineer at Scale AI. The rest of our team comes from... ...redefining go-to-market with state-of-the-art AI. As a Senior SRE, you'll tackle the scaling and reliability challenges that come with adding terabytes of data...
Senior
Unify
San Francisco, CA
5 days ago
Senior Site Reliability Engineer
...about this role, we encourage you to apply. The Role As a Senior Platform Engineer, you are a champion for DevOps and SRE culture and industry... ...are met. What You Will Be Doing Improving production reliability and system resilience within an SRE scoped team Championing...
Senior
Flexible hours
Megaport
Brisbane, CA
1 day ago
Senior Site Reliability Engineer
$160k - $250k
...public clouds when the right fit. As we continue to commercialize our machine learning models, we also need to grow our DevOps and Site Reliability team to maintain the reliability of our enterprise SaaS offering for our customers. Our ideal candidate is someone who is able...
Senior
Hive
San Francisco, CA
3 days ago
Remote Senior Site Reliability Engineer
...Senior Site Reliability Engineer (Enterprise Platform) Location: Remote - US - Open to Europe if happy to overlap with EST Compensation: Competitive We are a high-growth software company supporting the development of a premier open-source, EVM-compatible public ledger...
Senior
Contract work
Currently hiring
Remote work
GrabJobs
San Francisco, CA
5 days ago
Senior Site Reliability Engineer
...actionable to everyone, everywhere. That everyone now includes AI agents. The Role: You'll be the infrastructure and reliability engineer on the Data Replication team - a full-stack product team running over 3 million sync jobs a week powering thousands of data...
Senior
Work at office
Local area
Flexible hours
Airbyte
San Francisco, CA
4 days ago
Senior Site Reliability Engineer
...global culture at OutSystems!Hybrid Onsite in Menlo Park, CASite Reliability Engineering (SRE) is a discipline that incorporates aspects of software... ...delivering a smooth and frictionless Customer Experience.Site Reliability Engineer RoleAs an SRE at OutSystems here are...
Senior
Immediate start
Remote work
Worldwide
OutSystems
San Francisco, CA
1 day ago
Senior Site Reliability Engineer
# Senior Site Reliability EngineerHybrid - San Francisco**Our Mission & Values:** At Drata, we help companies earn and keep the trust of their... ...Job Summary:**Drata's SRE team operates as both a central engineering function and an embedded reliability practice. You'll be...
Senior
Work at office
Immediate start
Worldwide
Monday to Friday
Flexible hours
Careers at Drata
San Francisco, CA
5 days ago
Senior Site Reliability Engineer
$166.9k - $225.9k
...Job Summary Drata's SRE team operates as both a central engineering function and an embedded reliability practice. You'll be part of a close-knit SRE team... ...organization. What you’ll bring 6+ years of experience in Site Reliability Engineering, Cloud Engineering, or...
Senior
Flexible hours
Drata
San Francisco, CA
5 days ago
Senior Software Engineer - Site Reliability Engineering
...Udaip Cloud-Based Data And Ai Platform Engineer At U.S. Bank, we're on a journey to do our best. Helping the customers and businesses we serve to make better and smarter financial decisions and enabling the communities we support to grow and succeed. We believe it...
Senior
Temporary work
Work experience placement
Phenom People
San Francisco, CA
3 days ago
Senior Site Reliability Engineer
$181.69k - $213.75k
...Senior Site Reliability Engineer San Francisco, California; Santa Clara, California; Seattle, WA The Company You'll Join Carta connects founders, investors, and limited partners through world-class software, purpose-built for everyone in venture capital, private...
Senior
Full time
Work at office
Carta
San Francisco, CA
3 days ago
Senior Staff Site Reliability Engineer
$220k - $235k
...are seeking a strategic, high-output Staff/Senior Staff SRE to define the future of our cloud platform and champion engineering excellence across Ironclad. In this role,... ...leadership and strategic direction for the Site Reliability Engineering team and our broader Cloud...
Senior
Full time
Work at office
Jobr
San Francisco, CA
1 day ago
Senior/Staff Site Reliability Engineer
$50 per hour
...years of professional SRE experience 5+ years of experience contributing to architecture and design (architecture, design patterns, reliability and scaling) of new and current systems Bachelor’s Degree in Computer Science or related field, or 8+ years relevant work...
Senior
Temporary work
Work experience placement
Dormont Manufacturing Company
San Francisco, CA
4 days ago
Senior Staff Site Reliability Engineer
$181k - $263k
...and supporting deployments of global products, and providing first line operational support. We are looking for a Senior Staff Site Reliability Engineer who will set the technical direction for reliability engineering across LiveRamp's global infrastructure. This is a...
Senior
Work from home
Flexible hours
Night shift
LiveRamp
San Francisco, CA
4 days ago
Remote Senior Site Reliability Engineer (SRE) - Zetachain
We are seeking a Sr. Site Reliability Engineer to join our team and run critical infrastructure for our blockchain and web applications. You’ll learn to deploy and maintain a fleet of RPC and validator nodes for multiple blockchain networks. You’ll also provide guidance...
Senior
Remote job
Blockchain Works
San Francisco, CA
a month ago
CloudDevs: Senior Web site Reliability Engineer (SRE)
CloudDevs: Senior Web site Reliability Engineer (SRE) CloudDevs works with fast-moving, venture-backed startups throughout the US. We’re constructing a pool of world-class Web site Reliability Engineers for present roles and for upcoming alternatives. You’ll both be positioned...
Senior
The10minutecareersolution
San Francisco, CA
1 day ago
Site Reliability Engineer (Senior or Staff), Infrastructure Security
$127k - $249k
Senior / Staff Engineer - SRE, InfraSec We are looking for an experienced Senior or Staff Engineer for our SRE, InfraSec team to guide the security of our cloud‑based infrastructure. You will be highly hands‑on technically while also mentoring a small team of SREs. The...
Senior
Local area
Remote work
The Consulting Solutions
San Francisco, CA
1 day ago
Senior Site Reliability Engineer
$166.59k - $199.91k
...Role The company is looking for a high-performance engineer to be a part of a team of Site Reliability Engineers. You will be working closely with engineering... ...the deployment pipeline by working alongside senior SREs and gain valuable experience in effective incident...
Senior
Work experience placement
United States Digital Space LLC
Oakland, CA
3 days ago
Senior Cluster Site Reliability Engineer
...management. We have become a multibillion‑dollar asset manager, and we have ambitious goals for the future. As a Senior Cluster Site Reliability Engineer (SRE), you will help scale our research compute cluster to meet our growing needs, and you will leverage engineering...
Senior
Local area
The Voleon Group
Berkeley, CA
5 days ago
Sr. Site Reliability Engineer
...Sr. Site Reliability Engineer Job type: Full Time · Department: Platform · Work type: On-Site San Francisco, California, United States (Remote)... ...and operationalizes AI in production. We’re looking for a Senior Platform Engineer to design, build, and operate the core services...
Senior
Full time
Remote work
Neara
San Francisco, CA
4 days ago
Sr. Site Reliability Engineer
$163k - $203k
...Your role in our mission You will be a senior technical contributor on the SRE team, responsible for the reliability, scalability, and security of Prosper’s Cloud Platform... ...portfolio. This is as much of a platform engineering role as it is SRE role — you will maintain the...
Senior
Work experience placement
Work at office
Local area
Remote work
Flexible hours
2 days per week
Prosper.com
San Francisco, CA
3 days ago
Senior Site Reliability Engineer
$174.92k - $209.91k
...same: to make access to data as simple and reliable as electricity. With Fivetran, customer... ..., canonical and ready to query, with no engineering or maintenance required. We're proud... ...integrate our teams, systems, and career sites. About the Role Fivetran is building...
Senior
Full time
Work at office
Remote work
Fivetran
Oakland, CA
19 hours ago
Sr. Site Reliability Engineer
$106k - $130k
...sponsorship. Overall Purpose To create and maintain the next generation of application infrastructure and to be responsible for reliability, automation and scalability using and the latest best practices. Essential Functions Implement software and tools to...
Senior
Hourly pay
Work experience placement
Work at office
Immediate start
Visa sponsorship
Work visa
Flexible hours
Early Warning Services
San Francisco, CA
2 days ago
Senior SRE & InfraSec Engineer — Remote
The Consulting Solutions is seeking an experienced Senior / Staff Engineer for our SRE, InfraSec team in Seattle. The role involves leading the security of cloud-based infrastructure, mentoring a team of SREs, and collaborating with other engineering teams to ensure high...
Senior
Remote job
The Consulting Solutions
San Francisco, CA
1 day ago
Senior SRE Engineer: Scale & Reliability (Kubernetes/GCP)
A leading language learning platform is seeking an experienced SRE Engineer to ensure the reliability and resilience of their infrastructure. Responsibilities include leading incident response, improving observability, and collaborating with various teams to enhance platform...
Senior
Speak
San Francisco, CA
2 days ago
Senior SRE Platform Engineer for AI-Powered Code Review
An innovative R&D company in San Francisco is seeking a Site Reliability Engineer to join its Platform Engineering team. This position focuses on ensuring the reliability and performance of an AI-powered code review platform. The ideal candidate will have 6-8 years of experience...
Senior
CodeRabbit
San Francisco, CA
2 days ago
Senior Site Reliability Engineer- San Francisco, CA, the US
...Job Description Job Description Senior Site Reliability Engineer (Payments Infrastructure) Kody is seeking a Senior Site Reliability Engineer to ensure the reliability, availability, scalability, and operational excellence of our global payment platform. You will...
Senior
Kody
San Francisco, CA
21 days ago

Do you want to receive more vacancies?

Subscribe and receive similar vacancies to Senior Site Reliability Engineer. Be the first to apply!