Sign up to access all features of our service.
  • Job search
  • Favorites
  • Create a CV
    New
  • Salaries
  • Subscriptions

Senior Site Reliability Engineer

Careers at Drata

  • Senior Site Reliability EngineerHybrid - San Francisco**Our Mission & Values:** At Drata, we help companies earn and keep the trust of their users, customers, partners, and prospects. We’re the proof layer that shows great companies deserve the trust they aim to build.We live our values every day. Built on Trust means consistency is everything. Act with Integrity by always doing the right thing. Being Customer-Obsessed keeps the people we serve at the center of our work. Competitive Fire drives us to push ourselves harder than anyone else. Diversity brings unique perspectives that lead to better solutions. Automation First ensures we save time and money by making efficiency a priority.**Our Culture & Work Style **At Drata, we’re not just building software - we’re building a mindset. Everything we do springs from:* **Be a Driver (Owner‐Operator Mentality):** Own your work. Improve relentlessly. Deliver results.* **Move at Drata Speed (Precision & Velocity):** Fast decisions. Quick learning. Immediate impact.* **Stay Mission-Driven (Customer‐Obsessed):** Challenge assumptions. Deliver value. Stay hungry.We pair that high-velocity culture with a **thoughtful hybrid model** because we believe flexibility and collaboration both matter. That’s why in the Bay we come together **in-office Tuesday through Thursday** our high‐impact collaboration days where teams align, strategize, and innovate. Mondays and Fridays are flexible, giving you space for focused work, balance, and autonomy.If you thrive when you’re empowered, energized, and working with smart, mission-driven people where you’ll feel at home here.The best way to understand the Driver’s Mindset is to see it in action. We’re an award-winning, mission-driven team of **600+ people worldwide**, united by a culture that values trust, speed, and continuous growth.* Watch our CEO, Adam Markowitz, discuss the hyper-growth journey, from $0 to $100M ARR in just four years* : Explore our "Life at Drata" page for employee testimonials on our collaborative and the growth opportunities available.* : See why we are consistently recognized on Fortune's Best Workplaces lists.* Connect with Us on Socials: - follow us for company updates, employee stories, and career news.**Job Summary:**Drata's SRE team operates as both a central engineering function and an embedded reliability practice. You'll be part of a close-knit SRE team where you grow your career, shape standards, and collaborate with peers - while also serving as the dedicated reliability partner for one of Drata's product engineering teams across the full lifecycle of their work.This is a highly technical role at the intersection of software engineering and systems engineering. The best SREs at Drata are engineers first: they solve problems by building solutions, not by executing manual processes. Automation is a core value, and nowhere is that more visible than in how we approach reliability.Our infrastructure runs on AWS across multiple accounts, defined entirely in Terraform. You'll work across a modern cloud-native stack to help Drata scale reliably for a rapidly growing customer base.**What you’ll do:***Reliability Architecture for Your Product Team*You are the reliability expert for your aligned product team. You engage early - during architecture reviews and design discussions - to surface risks before they become incidents.* Lead Production Readiness Reviews (PRRs) before new services launch, with the authority to flag gaps and gate launches when critical reliability standards aren't met* Partner with product engineering leads and staff engineers to define SLOs and SLIs for critical services, turning reliability from a vague goal into a measurable commitment* Participate in team planning and architecture reviews to provide proactive reliability guidance* Build reusable artifacts - SLO templates, observability checklists, alerting standards, reference dashboards - that raise the reliability floor across the team, not just the services you touch directly*Eliminating Toil Through Engineering*You handle operational needs from your product team, but your job isn't to be a help desk. Your goal is to make each request the last of its kind. When an engineer needs something, your priority is: automate it so anyone can do it → document it so the team can self-serve → execute it manually only as a last resort.* Build and maintain Datadog monitors, dashboards, and alert routing - enforcing infrastructure-as-code standards via Terraform so those resources are owned, versioned, and auditable* Handle infrastructure requests: ECS task management, secret rotations, Terraform changes, capacity adjustments* Identify repeated manual work and convert it into self-service tooling or runbooks* Audit existing services for reliability anti-patterns and surface top risks before they cause incidents*Central SRE Platform Work*Beyond your product team, you contribute to cross-cutting infrastructure, tooling, and standards that benefit every team at Drata. Recent examples include automated Datadog governance workflows, dynamic AWS account provisioning, and disaster recovery exercises.* Design and build shared platform infrastructure - reusable Terraform modules, standardized observability stacks, service templates - so reliability improvements compound across the organization* Participate in the on-call rotation and lead incident response when needed; conduct thorough post-incident reviews to drive lasting fixes* Design and manage CI/CD pipelines using GitHub Actions* Contribute to evolving SRE standards, tooling, and practices across the organization**What you'll bring:*** 6+ years of experience in Site Reliability Engineering, Cloud Engineering, or building and maintaining scalable, resilient services* Robust knowledge of cloud computing technologies: Terraform, Docker, Git, and Linux* Hands-on experience with Datadog for monitoring, alerting, dashboards, SLO tracking, and distributed tracing* Experience building software systems as a software engineer* Experience developing tooling and automation in Python and/or Bash* Experience with CI/CD pipeline automation, specifically GitHub Actions* Experience with disaster recovery practices and incident management* Strong understanding of observability concepts - monitoring, logging, distributed tracing, and metrics - and how to apply them to production systems* Experience with container orchestration and deployment technologies including AWS ECS Fargate and/or Kubernetes* Experience working with relational databases (MySQL proficiency is a plus)* Ability to take ownership of problems and act on them independently in a constantly evolving environmentNice to Have:* Experience with AIOps - using AI/ML-based tooling for anomaly detection, predictive alerting, or automated incident triage* Familiarity with the reliability characteristics of AI/ML-backed services (e.g., LLM inference latency, non-determinism, prompt pipeline observability)* Experience with the JavaScript/Node.js ecosystem* Certified Kubernetes Administrator (CKA) certification* Familiarity with compliance frameworks like SOC 2, ISO 27001, or NIST**AI Experience** (required - at least one of the following):* Hands-on experience using AI-assisted development tools (e.g., GitHub Copilot, Cursor, or similar) to accelerate automation, scripting, or infrastructure work* Demonstrated use of AI/AIOps capabilities for reliability tasks - anomaly detection, incident triage, runbook generation, or alert noise reduction* Familiarity with the operational characteristics of AI/ML-backed services and what it means to make them observable and reliable in production* Demonstrated passion for AI through personal projects, contributions, or continuous learning in the context of infrastructure or reliability engineering**How we support you:**At Drata, our people are our strongest advantage—and we
  • J-18808-Ljbffr Careers at Drata

Vacancy posted 1 day ago
Similar jobs that could be interesting for youBased on the Senior Site Reliability Engineer in San Francisco, CA vacancy
  • US Corp. is seeking a Lead Site Reliability Engineer to spearhead our mission of delivering highly available and performant systems. With an average of over 12 years of industry experience, the successful candidate will bridge the gap between software development and systems... 
    Senior

    Axiom Pursuits

    San Francisco, CA
    22 hours ago
  • OutSystems, Inc. is looking for a Site Reliability Engineer to join their team in San Francisco, CA. The ideal candidate will lead the onboarding of services and teams to reliability tenets while establishing SLOs and SLAs. Proficiency in Python and experience with Kubernetes... 
    Senior
    Flexible hours

    OutSystems, Inc.

    San Francisco, CA
    22 hours ago
  • $227.2k - $324.5k

     ...About the Role: Site Reliability Engineering (SRE) at Tubi is not a traditional operations team. We are a software engineering organization...  ...automation. We are seeking an experienced and visionary Senior SRE Manager to lead and grow our newly built Site Reliability... 
    Senior
    Full time
    Contract work
    Temporary work
    Local area
    Flexible hours

    Tubi

    San Francisco, CA
    4 days ago
  •  ...co‑founders with PhDs in AI, Math, and Computer Science — is poised to redefine computing. About the Role We're seeking a Site Reliability Engineer to ensure Hyperbolic's GPU marketplace and AI infrastructure operate with exceptional reliability, performance, and... 
    Senior

    deCircle

    San Francisco, CA
    4 days ago
  •  ...Responsibilities Lead and onboard services and teams to the reliability tenets. Establish and maintain Service Level Objectives (...  ...Science or equivalent. 6+ years of experience in Site Reliability Engineering, managing infrastructure and services at scale. History of... 
    Senior

    OutSystems, Inc.

    San Francisco, CA
    22 hours ago
  •  ...work from home day is currently Tuesday. Engineering at Lambda is responsible for building...  ...observability adoptable and improve product reliability. Lead members of other engineering teams...  ...in Go Have 5+ years of experience in Site Reliability Engineering practices Possess... 
    Senior
    Work at office
    Local area
    Work from home

    Lambda

    San Francisco, CA
    4 days ago
  • What you’ll do As a Senior Site Reliability Engineer, you’ll work closely with product teams in Spend to deliver and maintain scalable, reliable cloud infrastructure in support of key product initiatives. Aligned to the roadmap, you’ll lead on infrastructure design and... 
    Senior

    Airwallex-

    San Francisco, CA
    4 days ago
  •  ...acquisition, and Connor was a machine learning research engineer at Scale AI. The rest of our team comes from...  ...redefining go-to-market with state-of-the-art AI. As a Senior SRE, you'll tackle the scaling and reliability challenges that come with adding terabytes of data... 
    Senior

    Unify

    San Francisco, CA
    22 hours ago
  • We are seeking a Sr. Site Reliability Engineer to join our team and run critical infrastructure for our blockchain and web applications. You’ll learn to deploy and maintain a fleet of RPC and validator nodes for multiple blockchain networks. You’ll also provide guidance... 
    Senior
    Remote job

    Blockchain Works

    San Francisco, CA
    7 days ago
  • $140k - $220k

    About the Job You’ll own reliability and operational excellence for Pylon's production systems. This means designing and implementing...  ...scale as we grow. You'll build tooling that makes the entire engineering team more effective, establish on-call rotations and runbooks... 
    Senior

    Pylon

    San Francisco, CA
    3 days ago
  •  ...alongside clinicians to make that possible. We’re a team of doctors, engineers, designers, researchers, and creatives building tools that...  ...for leading incidents end-to-end. Improve operational reliability: Identify recurring issues and reliability risks, and drive fixes... 
    Senior
    Work at office
    Worldwide

    Heidi Health Ltd

    San Francisco, CA
    22 hours ago
  • $50 per hour

     ...years of professional SRE experience 5+ years of experience contributing to architecture and design (architecture, design patterns, reliability and scaling) of new and current systems Bachelor's Degree in Computer Science or related field, or 8+ years relevant work... 
    Senior
    Temporary work
    Work experience placement

    Epoch Biodesign

    San Francisco, CA
    4 days ago
  • $60 per hour

    Senior Site Reliability Engineer (Copy) Seattle Hybrid (Hybrid location). Full-time. About Us Supio is a trusted AI platform purpose-built for law firms, reshaping how data drives impactful outcomes. Our innovative approach blends technology with deep legal expertise,... 
    Senior
    Full time
    Work at office
    Flexible hours

    Bonfirevc

    San Francisco, CA
    22 hours ago
  • For more information, please read ourSenior Site Reliability Engineer page is loaded## Senior Site Reliability Engineerlocations: US - San Francisco Bay Areatime type: Full timeposted on: Posted Yesterdayjob requisition id: R1478**There are NO limits to your career: come... 
    Senior
    Immediate start
    Remote work
    Worldwide

    OutSystems Inc.

    San Francisco, CA
    22 hours ago
  • $166.9k - $225.9k

    Job Summary Drata's SRE team operates as both a central engineering function and an embedded reliability practice. You'll be part of a close-knit SRE team...  ...organization. What you’ll bring 6+ years of experience in Site Reliability Engineering, Cloud Engineering, or... 
    Senior
    Flexible hours

    Drata

    San Francisco, CA
    1 day ago
  • $127k - $249k

    The Team Platform Engineering is the department within SRE that is responsible for a range of critical infrastructure and operational...  ...fleet, alongside the critical components that ensure cluster reliability and security (e.g., CoreDNS, cert-manager, and Gatekeeper). As... 
    Senior
    Work at office
    Local area
    Remote work
    Worldwide
    Flexible hours

    MongoDB

    San Francisco, CA
    2 days ago
  • CloudDevs: Senior Web site Reliability Engineer (SRE) CloudDevs works with fast-moving, venture-backed startups throughout the US. We’re constructing a pool of world-class Web site Reliability Engineers for present roles and for upcoming alternatives. You’ll both be positioned... 
    Senior

    The10minutecareersolution

    San Francisco, CA
    2 days ago
  • $220k - $235k

     ...are seeking a strategic, high‑output Staff/Senior Staff SRE to define the future of our cloud platform and champion engineering excellence across Ironclad. In this role,...  ...leadership and strategic direction for the Site Reliability Engineering team and our broader Cloud... 
    Senior
    Full time
    Work at office

    Ironclad Inc.

    San Francisco, CA
    22 hours ago
  • Senior Site Reliability Engineer - AI Infrastructure Location: Global Remote / San Francisco • Full-Time About Andromeda Andromeda Cluster was founded by Nat Friedman and Daniel Gross to give early‑stage startups access to the kind of scaled AI infrastructure once reserved... 
    Senior
    Full time
    Remote work

    Cortes 23

    San Francisco, CA
    22 hours ago
  • A tech company focused on AI is seeking a Site Reliability Engineer to ensure the reliability and performance of its GPU marketplace. This role involves maintaining service level objectives, managing capacity, and implementing secure systems. The ideal candidate has strong... 
    Senior

    Hyperbolic Labs

    San Francisco, CA
    2 days ago
  • Drata is seeking a Senior Site Reliability Engineer in San Francisco. In this role, you will engage in reliability architecture for product teams, lead production readiness reviews, and build automation around monitoring and alerting. The ideal candidate has at least 6... 
    Senior

    Careers at Drata

    San Francisco, CA
    1 day ago
  • $181k - $263k

    ## Senior Staff Site Reliability EngineerApplylocations: San Franciscotime type: Full timeposted on: Posted Yesterdayjob requisition id: JR01220...  ...support. We are looking for a Senior Staff Site Reliability Engineer who will set the technical direction for reliability... 
    Senior
    Work from home
    Flexible hours
    Night shift

    LiveRamp

    San Francisco, CA
    22 hours ago
  • $127k - $249k

    We are looking for an experienced Senior or Staff Engineer for our SRE, InfraSec team, to guide the security of our cloud-based infrastructure. As a Staff SRE, you will be very hands‑on technically while also mentoring a small team of SREs. The InfraSec team collaborates... 
    Senior
    Local area
    Remote work
    Flexible hours

    Insider, Inc.

    San Francisco, CA
    22 hours ago
  • Airwallex- is seeking a Senior Site Reliability Engineer in San Francisco, California, to work with product teams to build and maintain robust cloud infrastructure. In this role, you will lead critical infrastructure projects, ensuring the reliability and performance of... 
    Senior

    Airwallex-

    San Francisco, CA
    22 hours ago
  •  ...cloud-native systems. As a Staff Platform Engineer, you will play a critical role in...  ...technical leadership role. You will own reliability for major platform domains, design scalable...  ...Infrastructure Development, Platform Engineering, or Site Reliability Engineering role, with a... 
    Senior

    Saviynt

    San Francisco, CA
    2 days ago
  • $232k - $319k

     ...to help us continue to scale the service with great people and reliable, cost-effective, and efficient infrastructure, processes, and...  ...with self-service Accelerate the velocity of SRE and product engineering by developing robust platforms, powerful tooling, and... 
    Senior
    Permanent employment
    Local area
    Worldwide
    Flexible hours

    Okta, Inc.

    San Francisco, CA
    22 hours ago
  • $151.5k - $252.5k

    A leading technology firm is seeking a Senior Site Reliability Engineer to join their Data Cloud engineering team in San Francisco. The role requires expertise in Azure infrastructure and SaaS applications, focusing on building reliable, scalable systems. The ideal candidate... 
    Senior

    Veeam

    San Francisco, CA
    1 day ago
  • A leading language learning platform is seeking an experienced SRE Engineer to ensure the reliability and resilience of their infrastructure. Responsibilities include leading incident response, improving observability, and collaborating with various teams to enhance platform... 
    Senior

    Speak

    San Francisco, CA
    3 days ago
  •  ...management. We have become a multibillion‑dollar asset manager, and we have ambitious goals for the future. As a Senior Cluster Site Reliability Engineer (SRE), you will help scale our research compute cluster to meet our growing needs, and you will leverage engineering... 
    Senior
    Local area

    The Voleon Group

    Berkeley, CA
    3 days ago
  • $163k - $203k

    Your role in our mission you will be a senior technical contributor on the SRE team, responsible for the reliability, scalability, and security of Prosper’s Cloud Platform portfolio. This is as much a platform engineering role as it is an SRE role— you will maintain the... 
    Senior
    Work experience placement
    Work at office
    Remote work
    Flexible hours
    2 days per week

    GoTo Meeting

    San Francisco, CA
    22 hours ago

Do you want to receive more vacancies?

Subscribe and receive similar vacancies to Senior Site Reliability Engineer. Be the first to apply!