Senior Site Reliability Engineer
CodeRabbit
About CodeRabbit CodeRabbit is an innovative research and development company focused on building extraordinarily productive human-machine collaboration systems. Our primary goal is to create the next generation of Gen AI-driven code reviewers: a symbiotic partnership between humans and advanced algorithms that significantly outperforms individual engineers. We combine language models with human ingenuity to push the boundaries of software development efficiency and quality. The Role: We are seeking an experienced Site Reliability Engineer to join our Platform Engineering team in the Bay Area. You'll be instrumental in ensuring the high availability, performance, and scalability of CodeRabbit's AI-powered code review platform. This role sits at the intersection of software engineering and systems operations, where you'll build the foundational platforms and automation that enable our engineering teams to deploy, monitor, and scale our services reliably. As an SRE at CodeRabbit, you'll be responsible for enhancing the reliability of our critical services that process millions of code reviews, building sophisticated automation platforms, and owning the infrastructure that powers our AI-driven analysis engine. You'll work with cutting-edge technologies including large language models, real-time processing systems, and distributed architectures that operate at significant scale. Key Responsibilities: Infrastructure & Platform Ownership
- Design, implement, and maintain scalable infrastructure on Google Cloud Platform to support CodeRabbit's growing user base and processing demands
- Own and operate critical platform services
- Build and maintain Infrastructure as Code using Terraform to ensure consistent, reproducible, and version-controlled infrastructure deployments
- Establish and maintain SLI/SLO frameworks for all critical services, ensuring we meet our reliability commitments to users
- Implement comprehensive monitoring, alerting, and observability solutions using Datadog and custom instrumentation
- Conduct thorough incident response, root cause analysis, and post-mortem processes to continuously improve system reliability
- Optimize application and infrastructure performance to handle millions of pull request analyses with minimal latency
- Design and implement chaos engineering practices to proactively identify and resolve system weaknesses
- Develop self-service platforms and tooling that empower engineering teams to deploy, monitor, and troubleshoot their services independently
- Automate operational tasks including scaling, backup/recovery, security patching, and routine maintenance
- Create and maintain infrastructure APIs and abstractions that simplify complex operations for development teams]
- Integrate security best practices into all infrastructure and platform services
- Implement and maintain security monitoring, vulnerability scanning, and compliance reporting
- Design secure network architectures including VPC configuration, firewall rules, and access control systems
- Establish and maintain disaster recovery procedures and business continuity planning
- 7+years of hands-on experience in Site Reliability Engineering, Platform Engineering, or DevOps Engineering roles
- Proven track record of managing production systems at scale, preferably in high-growth technology companies
- Experience with cloud platforms, particularly AWS or Google Cloud Platform (GCP), including compute, storage, networking, and managed services
- Strong background in containerization and orchestration platforms (Kubernetes, Docker)
- Programming Languages : Proficiency in Node.js and TypeScript for building automation tools, monitoring solutions, and platform services
- Infrastructure as Code : Advanced experience with Terraform for infrastructure provisioning and management
- Monitoring & Observability : Hands-on experience with Datadog or similar platforms (Prometheus, Grafana, ELK stack) for observability
- Cloud Platforms : Comprehensive experience with GCP services including Compute Engine, GKE, Cloud Run, Cloud SQL, Cloud Storage, Load Balancing, and IAM
- Strong Linux/Unix systems skills
- Experience with network protocols, load balancing, and CDN technologies
- Knowledge of security principles and best practices for cloud infrastructure
- Familiarity with CI/CD tools and practices (Jenkins, GitLab CI, GitHub Actions)
- Understanding of microservices architecture and distributed systems principles
- Experience with AI/ML infrastructure and tools
- Background in managing high-traffic web applications and API services
- Experience with disaster recovery planning and execution
- Familiarity with compliance frameworks (SOC 2, ISO 27001)
- Contributions to open-source infrastructure or SRE tooling projects
- Experience with cost optimization and FinOps practices
- Knowledge of performance testing and capacity planning methodologies
- CodeRabbit is building the next generation of AI-native developer tooling - starting with code review. We combine large language models with deep software engineering context to help teams ship faster, catch more bugs, and make better architectural decisions at scale.
- We are a high-ownership engineering culture. That means no passive execution, no waiting for perfect tickets, and no narrowly defined task boundaries. Engineers here find problems before they're assigned, use AI as a core part of how they build, ship with judgment, and own outcomes from proposal to production.
- Our operating philosophy : bias toward action, ship the smallest necessary coherent slice, validate proportional to risk, watch what happens, and make the system better. AI drafts; humans decide. Speed matters, but so does understanding what you ship.
- This opportunity will be energizing for people who want real ownership, pace, and high standards . It's uncomfortable for people who prefer slow consensus or heavily managed workflows.
- If you want to build tools that are changing how software gets written, and be held to the standard that the best engineers thrive under; we'd love to talk.
- Collaborative Humans : Prioritizing collective intelligence
- Fearless Innovators : Turning obstacles into growth opportunities
- Persistent, Passionate Developers : Thriving on complex, long-term challenges
- Impact-Driven Creators : Crafting intuitive tools for developers
- Rapid Learners and Un-learners : Adapting quickly in our fast-paced technological world
Vacancy posted 5 days ago
Similar jobs that could be interesting for youBased on the Senior Site Reliability Engineer in San Francisco, CA vacancy
- ...About the Role We're looking for an experienced Site Reliability Engineer (SRE) to help us scale our platform with reliability, observability, and operational excellence at the core. You'll partner with engineers and data scientists to build, automate, and maintain...Senior
- ...founders with PhDs in AI, Math, and Computer Science - is poised to redefine computing. About the Role We're seeking a Site Reliability Engineer to ensure Hyperbolic's GPU marketplace and AI infrastructure operate with exceptional reliability, performance, and...Senior
- ...experiment constantly as we find the right paths in an AI-native landscape. The Role You'll be the infrastructure and reliability engineer on the Data Replication team - a full-stack product team running over 3 million sync jobs a week powering thousands of data...SeniorLocal area
- ...About the job Senior Site Reliability Engineer About the Company Stellar is a decentralized, public blockchain that gives developers the tools to create experiences that are more like cash than crypto. The network is faster, cheaper, and far more energy-efficient...Senior
$160k - $250k
...public clouds when the right fit. As we continue to commercialize our machine learning models, we also need to grow our DevOps and Site Reliability team to maintain the reliability of our enterprise SaaS offering for our customers. Our ideal candidate is someone who is able...Senior$181.69k - $213.75k
...Senior Site Reliability Engineer San Francisco, California; Santa Clara, California; Seattle, WA The Company You'll Join Carta connects founders, investors, and limited partners through world-class software, purpose-built for everyone in venture capital, private...SeniorFull timeWork at office$195k - $240k
...Senior Site Reliability Engineer San Francisco (Hybrid) At You.com, we are building the AI Search Infrastructure that powers modern AI systems. Our goal is to create the trusted knowledge layer that agents, applications, and enterprises rely on to retrieve real-...SeniorFull timeImmediate startRemote workWork from homeFlexible hours- US Corp. is seeking a Lead Site Reliability Engineer to spearhead our mission of delivering highly available and performant systems. With an average of over 12 years of industry experience, the successful candidate will bridge the gap between software development and systems...Senior
$127k - $249k
...The Team Platform Engineering is the department within SRE that is responsible for a range of critical infrastructure and operational... ...fleet, alongside the critical components that ensure cluster reliability and security (e.g., CoreDNS, cert-manager, and Gatekeeper)....SeniorWork at officeLocal areaRemote workWorldwideFlexible hours$117k - $209.33k
...Job Requisition ID # 26WD99273 Position Overview Want to help make a better world? As a Senior Site Reliability Engineer at Autodesk, you can help us build and operate reliable, secure, and scalable cloud services for Autodesk GovCloud products. As part of a...SeniorFor contractors- OutSystems, Inc. is looking for a Site Reliability Engineer to join their team in San Francisco, CA. The ideal candidate will lead the onboarding of services and teams to reliability tenets while establishing SLOs and SLAs. Proficiency in Python and experience with Kubernetes...SeniorFlexible hours
- ...come shape the future and be part of a truly unique global culture at OutSystems! Hybrid Onsite in Menlo Park, CA Site Reliability Engineering (SRE) is a discipline that incorporates aspects of software engineering and applies them to infrastructure and...SeniorImmediate startRemote workWorldwide
$166.9k - $225.9k
...Summary: Drata's SRE team operates as both a central engineering function and an embedded reliability practice. You'll be part of a close-knit SRE team... ...What you'll bring: ~6+ years of experience in Site Reliability Engineering, Cloud Engineering, or building...SeniorWork at officeImmediate startWorldwideMonday to FridayFlexible hours$287k
...Series B and have grown 800% over the last 12 months. Engineering at Ivo Engineers at Ivo are inventors. Ivo was... ...expect us to hit our SLAs. What ? We're looking for an Senior or Staff Site level Reliability Engineer as part of Infrastructure team to: Own...SeniorContract workWork at officeRemote work$220k - $235k
...Staff/Senior Staff Site Reliability Engineer Ironclad is the leading AI contracting platform that transforms agreements into assets. Contracts move faster, insights surface instantly, and agents push work forward, all with you in control. Whether you're buying or selling...SeniorFull timeContract workWork at office- ...Responsibilities Lead and onboard services and teams to the reliability tenets. Establish and maintain Service Level Objectives (... ...Science or equivalent. 6+ years of experience in Site Reliability Engineering, managing infrastructure and services at scale. History of...Senior
- ...work from home day is currently Tuesday. Engineering at Lambda is responsible for building... ...observability adoptable and improve product reliability. Lead members of other engineering teams... ...in Go Have 5+ years of experience in Site Reliability Engineering practices Possess...SeniorWork at officeLocal areaWork from home
- ...alongside clinicians to make that possible. We’re a team of doctors, engineers, designers, researchers, and creatives building tools that... ...for leading incidents end-to-end. Improve operational reliability: Identify recurring issues and reliability risks, and drive fixes...SeniorWork at officeWorldwide
- ...acquisition, and Connor was a machine learning research engineer at Scale AI. The rest of our team comes from... ...redefining go-to-market with state-of-the-art AI. As a Senior SRE, you'll tackle the scaling and reliability challenges that come with adding terabytes of data...Senior
$140k - $220k
About the Job You’ll own reliability and operational excellence for Pylon's production systems. This means designing and implementing... ...scale as we grow. You'll build tooling that makes the entire engineering team more effective, establish on-call rotations and runbooks...Senior- What you’ll do As a Senior Site Reliability Engineer, you’ll work closely with product teams in Spend to deliver and maintain scalable, reliable cloud infrastructure in support of key product initiatives. Aligned to the roadmap, you’ll lead on infrastructure design and...Senior
- We are seeking a Sr. Site Reliability Engineer to join our team and run critical infrastructure for our blockchain and web applications. You’ll learn to deploy and maintain a fleet of RPC and validator nodes for multiple blockchain networks. You’ll also provide guidance...SeniorRemote job
$181k - $263k
...and supporting deployments of global products, and providing first line operational support. We are looking for a Senior Staff Site Reliability Engineer who will set the technical direction for reliability engineering across LiveRamp's global infrastructure. This is a...SeniorWork from homeFlexible hoursNight shift- ...about this role, we encourage you to apply. The Role As a Senior Platform Engineer, you are a champion for DevOps and SRE culture and... ...goals are met. What You Will Be Doing Improving production reliability and system resilience within an SRE scoped team Championing...SeniorFlexible hours
$60 per hour
Senior Site Reliability Engineer (Copy) Seattle Hybrid (Hybrid location). Full-time. About Us Supio is a trusted AI platform purpose-built for law firms, reshaping how data drives impactful outcomes. Our innovative approach blends technology with deep legal expertise,...SeniorFull timeWork at officeFlexible hours$50 per hour
...years of professional SRE experience 5+ years of experience contributing to architecture and design (architecture, design patterns, reliability and scaling) of new and current systems Bachelor's Degree in Computer Science or related field, or 8+ years relevant work...SeniorTemporary workWork experience placement$166.9k - $225.9k
Job Summary Drata's SRE team operates as both a central engineering function and an embedded reliability practice. You'll be part of a close-knit SRE team... ...organization. What you’ll bring 6+ years of experience in Site Reliability Engineering, Cloud Engineering, or...SeniorFlexible hours- CloudDevs: Senior Web site Reliability Engineer (SRE) CloudDevs works with fast-moving, venture-backed startups throughout the US. We’re constructing a pool of world-class Web site Reliability Engineers for present roles and for upcoming alternatives. You’ll both be positioned...Senior
- Senior Site Reliability Engineer - AI Infrastructure Location: Global Remote / San Francisco • Full-Time About Andromeda Andromeda Cluster was founded by Nat Friedman and Daniel Gross to give early‑stage startups access to the kind of scaled AI infrastructure once reserved...SeniorFull timeRemote work
- Drata is seeking a Senior Site Reliability Engineer in San Francisco. In this role, you will engage in reliability architecture for product teams, lead production readiness reviews, and build automation around monitoring and alerting. The ideal candidate has at least 6...Senior
Do you want to receive more vacancies?
Subscribe and receive similar vacancies to Senior Site Reliability Engineer. Be the first to apply!
Related searches
- site reliability engineer remote San Francisco, CA
- site reliability engineer San Francisco, CA
- site reliability engineer sre San Francisco, CA
- senior data management analyst San Francisco, CA
- senior app developer San Francisco, CA
- senior game producer San Francisco, CA
- senior retail sales associate San Francisco, CA
- senior manager quality engineering San Francisco, CA
- senior software test automation engineer San Francisco, CA
- senior quantitative risk analyst San Francisco, CA

