Sign up to access all features of our service.
  • Job search
  • Favorites
  • Create a CV
    New
  • Salaries
  • Subscriptions

Senior Manager, Site Reliability Engineering

$227.2k - $324.5k

Tubi

About the Role:

Site Reliability Engineering (SRE) at Tubi is not a traditional operations team. We are a software engineering organization that applies a developer's mindset and toolkit to the challenges of building and running large-scale, distributed systems. Our mission is to engineer resilience from the ground up, enabling our product teams to innovate rapidly while ensuring our users have a stellar experience. We own the availability, latency, performance, and capacity of our platform, and we achieve our goals through a culture of data-driven decision-making, blameless learning, and relentless automation.

We are seeking an experienced and visionary Senior SRE Manager to lead and grow our newly built Site Reliability Engineering team. You are more than a people manager or a tech lead; you are the strategic leader responsible for architecting our reliability roadmap. You will build and mentor a team of talented engineers, foster a culture of blameless learning and continuous improvement, and champion the engineering practices that allow us to balance rapid innovation with rock-solid stability. You will be a key influencer in our engineering leadership, partnering with peers across the organization to ensure reliability is a shared responsibility and a core tenet of our engineering culture.

What You'll Do:
  • Team Leadership & Mentorship:
    • Lead, mentor, and grow a team of Site Reliability Engineers. Foster a culture of innovation and technical excellence where engineers feel empowered to do their best work. Provide personalized coaching, create professional development plans, and guide the careers of senior and emerging talent within the team.
    • Establish equitable, sustainable on-call practices (including global coverage where applicable) that protect focus time and avoid burnout.
    • Define team rituals - runbook reviews, game days, and incident retros - that reinforce quality and learning.
  • Strategic Planning & Vision: Define and drive the multi-year technical strategy and vision for Tubi's observability, and automation platforms. Partner with infra lead to align Tubi's infrastructure & SRE roadmap. Partner with tech leaders to align the SRE roadmap with business objectives. Champion a data-driven approach to reliability, using Service Level Objectives (SLOs) and error budgets to facilitate productive conversations about risk and feature velocity.
  • Operational Excellence & Incident Management:
    • Own the end-to-end availability, performance, and efficiency of our critical user-facing services. Evolve our incident response practice to reduce Mean Time to Resolution (MTTR) and Mean Time Between Failures (MTBF). Champion a rigorous, blameless, and data-driven post-mortem culture to ensure we learn from both successes and failures, driving eng teams for systemic fixes and automation to prevent the recurrence of incidents.
    • Streamline and improve our existing processes and practices, and collaborate with other teams to enhance our production release standards by improving current processes.
    • Define and tune a 24×7 on-call rotation for low noise and fast response; act as executive escalation partner during major incidents.
    • Own disaster-recovery strategy (playbooks, failover drills, recovery simulations) and track SLO gaps with time-bound remediations.
  • Financial & Vendor Management: Own the SRE budget, tooling, and headcount. Manage relationships with key third-party vendors for our observability and SRE related AI platforms, work with infra lead and finance team for contract negotiations and ensure we derive maximum value from our investments.
  • Cross-Functional Collaboration: Act as a key influencer and strategic partner to leaders in Software Engineering, Product Management, and Infra/Sec. Drive the adoption of SRE best practices and principles throughout the organization, ensuring new services are designed for reliability, scalability, and observability from day one.
  • The AI Mandate : Building the Future of Observability with AI. You will not just manage a team that uses AI; you will lead the charge in building an AI-native SRE function. This is a strategic mandate that requires a forward-thinking leader who understands both the potential and the pitfalls of integrating intelligent systems into critical operations. This includes:
    • AIOps Strategy Development: Developing and executing the strategy for integrating AIOps and machine learning into our observability stack. Your goal will be to move the team from a reactive monitoring posture to one of predictive maintenance and automated anomaly detection, fundamentally changing how we ensure reliability.
    • Accelerating Automation with AI: Championing the effective and responsible use of AI-assisted coding tools (e.g., Claude Code, Cursor) within the SRE team. You will set the standards and practices to leverage these tools to accelerate the development of automation, operational tooling, and infrastructure code.
    • Building the Business Case: Building the techno-economic case for new AI tooling, managing vendor relationships, and ensuring the cost-effective and secure implementation of these powerful systems. You must be able to articulate the ROI of these investments in terms of reduced downtime, improved operational efficiency, and faster incident resolution.
    • Fostering Critical AI Literacy: Fostering a culture that can critically evaluate, debug, and learn from the outputs of AI systems. This involves extending our blameless post-mortem philosophy to AI-driven actions and recommendations, ensuring that the team remains in control and understands the "why" behind automated decisions.
Your Background:
  • 8+ years of experience in a technical field, with at least a year in an engineering leadership position managing SRE, DevOps, or Production Engineering teams.
  • A deep, principled understanding of SRE tenets, including Service Level Indicators (SLIs), SLOs, error budgets, toil reduction, and capacity planning.
  • Exceptional communication, negotiation, and influencing skills, with the ability to articulate complex technical concepts and strategies to both technical and non-technical stakeholders at all levels of the organization.
  • A strong technical background as a hands-on software engineer or site reliability engineer prior to moving into management. Deep knowledge of AWS services (especially networking, IAM, EKS, ALBs/NLBs, Route 53, CloudWatch). Proven experience with Kubernetes in production (EKS preferred), including service exposure, networking, and availability engineering.
  • Hands-on familiarity with modern SRE tools and technologies, including Infrastructure as Code (e.g., Terraform, Ansible), container orchestration (Kubernetes), observability platforms (e.g., Prometheus, Grafana, Datadog, Splunk), and incident tooling (e.g., PagerDuty, FireHydrant), deployment-safety tooling (e.g., Argo Rollouts, LaunchDarkly), and observability standards (e.g., OpenTelemetry).
#LI-BT1

#LI-Hybrid


Pursuant to state and local pay disclosure requirements, the pay range for this role, with final offer amount dependent on education, skills, experience, and location, is listed annually below. This role is also eligible for various benefits, including medical/dental/vision, insurance, a 401(k) plan, paid time off, and other benefits in accordance with applicable plan documents.

High cost labor markets such as but not limited to Los Angeles, New York City, and San Francisco

$227,200-$324,500 USD

Tubi is a division of Fox Corporation, and the FOX Employee Benefits summarized here, covers the majority of all US employee benefits. The following distinctions below outline the differences between the Tubi and FOX benefits:
  • For US-based non-exempt Tubi employees, the FOX Employee Benefits summary accurately captures the Vacation and Sick Time.
  • For all salaried/exempt employees, in lieu of the FOX Vacation policy, Tubi offers a Flexible Time off Policy to manage all personal matters.
  • For all full-time, regular employees, in lieu of FOX Paid Parental Leave, Tubi offers a generous Parental Leave Program, which allows parents twelve (12) weeks of paid bonding leave within the first year of birth, adoption, surrogacy, or foster placement of a child in addition to applicable government leave program(s) and FOX's short-term disability policy. This time is 100% paid through a combination of any applicable state, city, and federal leaves and wage-replacement programs in addition to contributions made by Tubi.
  • For all full-time, regular employees, Tubi offers a monthly wellness reimbursement.
About Tubi:

Boldly built for every fandom, Tubi is a free streaming service that entertains over 100 million monthly active users. Tubi offers the world's largest collection of Hollywood movies and TV shows, thousands of creator-led stories and hundreds of Tubi Originals made for the most passionate fans. Headquartered in San Francisco and founded in 2014, Tubi is part of Tubi Media Group, a division of Fox Corporation.

We are an equal opportunity employer and all qualified applicants will receive consideration for employment without regard to race, color, religion, sex, national origin, gender identity, disability, protected veteran status, or any other characteristic protected by law. We will consider for employment qualified applicants with criminal histories consistent with applicable law.
Vacancy posted 3 days ago
Similar jobs that could be interesting for youBased on the Senior Manager, Site Reliability Engineering in San Francisco, CA vacancy
  •  ...The TeamPlatform Engineering is the department within SRE that is responsible for a range...  ...observability and alerting systems.The Fleet Management team provides the core runtime...  ...critical components that ensure cluster reliability and security (e.g., CoreDNS, cert-manager... 
    Senior
    Work at office
    Local area
    Remote work
    Worldwide
    Flexible hours

    MongoDB

    San Francisco, CA
    3 days ago
  • $140k - $205k

     ...Senior Technology Site Reliability Engineer Cooley is seeking a Senior Site Reliability Engineer to join the Infrastructure & Development Operationsteam...  ...high availability and performance Implement and manage service-level indicators (SLIs), objectives (SLO's), agreements... 
    Senior
    Full time
    Temporary work
    Work at office
    Flexible hours
    Weekend work

    Cooley

    San Francisco, CA
    3 days ago
  • US Corp. is seeking a Lead Site Reliability Engineer to spearhead our mission of delivering highly available and performant systems. With an average...  ...implementing automated infrastructure using Terraform, managing containerized workloads within Kubernetes, and refining... 
    Senior

    Axiom Pursuits

    San Francisco, CA
    1 hour ago
  • $210.6k - $305.1k

     ...~ Drive strategic vision for the management and continued expansion of FedRAMP-compliant...  ...~ You have led a distributed team of 5+ engineers, can demonstrate strong technical vision...  ...insurance. Please see the Cisco careers site to discover more benefits and perks. Employees... 
    Senior
    Full time
    Temporary work
    Local area
    Flexible hours

    Cisco

    San Francisco, CA
    4 days ago
  • OutSystems, Inc. is looking for a Site Reliability Engineer to join their team in San Francisco, CA. The ideal candidate will lead the onboarding of services and teams to reliability tenets while establishing SLOs and SLAs. Proficiency in Python and experience with Kubernetes... 
    Senior
    Flexible hours

    OutSystems, Inc.

    San Francisco, CA
    11 hours ago
  • Fieldguide is seeking a Senior Site Reliability Engineer to ensure the reliability and scalability of our production systems in San Francisco, CA. The role involves working closely with product teams to define reliability standards and build robust observability practices... 
    Senior
    Remote job
    Flexible hours

    Fieldguide

    San Francisco, CA
    3 days ago
  •  ...poised to redefine computing. About the Role We're seeking a Site Reliability Engineer to ensure Hyperbolic's GPU marketplace and AI...  ...success rates, building robust incident response systems, managing capacity across our distributed GPU network, and implementing... 
    Senior

    deCircle

    San Francisco, CA
    4 days ago
  •  ...home day is currently Tuesday. Engineering at Lambda is responsible for...  ...for system deployment, management and maintenance. What You’ll...  ...adoptable and improve product reliability. Lead members of other engineering...  ...5+ years of experience in Site Reliability Engineering... 
    Senior
    Work at office
    Local area
    Work from home

    Lambda

    San Francisco, CA
    4 days ago
  • $140k - $185k

     ...alongside clinicians to make that possible. We’re a team of doctors, engineers, designers, researchers, and creatives building tools that...  ...in on-call and incident response: Improve operational reliability: Own parts of the production environment: Strengthen observability... 
    Senior
    Work at office
    Worldwide

    Dormont Manufacturing Co

    San Francisco, CA
    11 hours ago
  •  ...Responsibilities Lead and onboard services and teams to the reliability tenets. Establish and maintain Service Level...  ...Computer Science or equivalent. 6+ years of experience in Site Reliability Engineering, managing infrastructure and services at scale. History of end‑to... 
    Senior

    OutSystems, Inc.

    San Francisco, CA
    11 hours ago
  • What you’ll do As a Senior Site Reliability Engineer, you’ll work closely with product teams in Spend to deliver and maintain scalable, reliable cloud infrastructure in support of key product initiatives. Aligned to the roadmap, you’ll lead on infrastructure design and... 
    Senior

    Airwallex-

    San Francisco, CA
    4 days ago
  •  ...you will find a home at Fieldguide. About the Role As a Senior Site Reliability Engineer (SRE) at Fieldguide, you will be responsible for...  ...Establish best practices for monitoring, alerting, and incident management to ensure rapid detection and resolution of issues.... 
    Senior
    Remote work
    Work from home
    Flexible hours

    Fieldguide

    San Francisco, CA
    3 days ago
  •  ...acquisition, and Connor was a machine learning research engineer at Scale AI. The rest of our team comes from...  ...redefining go-to-market with state-of-the-art AI. As a Senior SRE, you'll tackle the scaling and reliability challenges that come with adding terabytes of data... 
    Senior

    Unify

    San Francisco, CA
    11 hours ago
  • $60 per hour

    Senior Site Reliability Engineer (Copy) Seattle Hybrid (Hybrid location). Full-time. About Us Supio is a trusted AI platform purpose-built for law...  ...our stack. You’ll own the release pipeline end‑to‑end — managing daily releases, weekly deploys, and hotfixes — while also... 
    Senior
    Full time
    Work at office
    Flexible hours

    Bonfirevc

    San Francisco, CA
    11 hours ago
  • # Senior Site Reliability EngineerHybrid - San Francisco**Our Mission & Values:** At Drata, we help...  ...SRE team operates as both a central engineering function and an embedded reliability...  ...Handle infrastructure requests: ECS task management, secret rotations, Terraform changes,... 
    Senior
    Work at office
    Immediate start
    Worldwide
    Monday to Friday
    Flexible hours

    Careers at Drata

    San Francisco, CA
    1 day ago
  • $166.9k - $225.9k

     ...SRE team operates as both a central engineering function and an embedded reliability practice. You'll be part of a...  ...infrastructure requests: ECS task management, secret rotations, Terraform changes...  ...bring 6+ years of experience in Site Reliability Engineering, Cloud Engineering... 
    Senior
    Flexible hours

    Drata

    San Francisco, CA
    1 day ago
  • $165k - $225k

     ...growing and changing Stellar ecosystem. SDF is looking for a Senior Site Reliability Engineer to help build and operate the foundation that powers our...  ...engineer. First-hand experience with configuration management and infrastructure as code (Ansible, Puppet, Terraform).... 
    Senior
    Temporary work
    Work at office
    Local area
    Worldwide
    Flexible hours

    Stellar

    San Francisco, CA
    3 days ago
  • $325k

    Engineering at Ivo Engineers At Ivo Are Inventors. Ivo Was First-to-market With An AI agent that lives in MS Word and edits...  ...expect us to hit our SLAs. We’re looking for an Senior or Staff Site level Reliability Engineer as part of the Infrastructure team to: Own uptime... 
    Senior
    Contract work

    Icehouseventures

    San Francisco, CA
    4 days ago
  • Senior Site Reliability Engineer - AI Infrastructure Location: Global Remote / San Francisco • Full-Time About Andromeda Andromeda Cluster was founded...  ...reserved only for hyperscalers. We began with a single managed cluster - but it filled almost instantly. Since then, we... 
    Senior
    Full time
    Remote work

    Cortes 23

    San Francisco, CA
    11 hours ago
  • A tech company focused on AI is seeking a Site Reliability Engineer to ensure the reliability and performance of its GPU marketplace. This role involves maintaining service level objectives, managing capacity, and implementing secure systems. The ideal candidate has strong... 
    Senior

    Hyperbolic Labs

    San Francisco, CA
    2 days ago
  • $15 per hour

    Summary The Wikimedia Foundation is looking for a Senior Site Reliability Engineer to support and develop the platform serving the world’s favorite...  ...) Implementing and utilizing configuration management and deployment tools (Puppet, Kubernetes) Leading continuous... 
    Senior
    Permanent employment
    For contractors
    Remote work

    Nerdleveltech

    San Francisco, CA
    4 days ago
  • Drata is seeking a Senior Site Reliability Engineer in San Francisco. In this role, you will engage in reliability architecture for product teams, lead production readiness reviews, and build automation around monitoring and alerting. The ideal candidate has at least 6... 
    Senior

    Careers at Drata

    San Francisco, CA
    1 day ago
  • $181k - $263k

     ...providing first line operational support. We are looking for a Senior Staff Site Reliability Engineer who will set the technical direction for reliability...  ...: internals, autoscaling, multi-tenant workload management, and rightsizingAdvanced experience with real-time and... 
    Senior
    Work from home
    Flexible hours
    Night shift

    Liveramp

    San Francisco, CA
    3 days ago
  • $127k - $249k

    We are looking for an experienced Senior or Staff Engineer for our SRE, InfraSec team, to guide the security of our cloud-based infrastructure...  ..., GCP), including network and compute security, identity management, and cloud security posture management (CSPM) Automation... 
    Senior
    Local area
    Remote work
    Flexible hours

    Insider, Inc.

    San Francisco, CA
    11 hours ago
  • $232k - $319k

     ...scale the service with great people and reliable, cost-effective, and efficient infrastructure...  ..., and tooling. As the Sr. Manager of Infrastructure Platform and Shared Services...  ...partnership with architects and product engineering Build a world-class observability... 
    Senior
    Permanent employment
    Local area
    Worldwide
    Flexible hours

    Okta, Inc.

    San Francisco, CA
    11 hours ago
  • Airwallex- is seeking a Senior Site Reliability Engineer in San Francisco, California, to work with product teams to build and maintain robust cloud infrastructure. In this role, you will lead critical infrastructure projects, ensuring the reliability and performance of... 
    Senior

    Airwallex-

    San Francisco, CA
    11 hours ago
  •  ...+ million raised to date. About the role: As a Senior Site Reliability Engineer , you will lead the architectural strategy and operational...  ...and on-call systems, ensuring efficient incident management and up-leveling the incident management system at Anyscale... 
    Senior
    Full time
    Work experience placement

    Anyscale

    San Francisco, CA
    6 hours ago
  •  ...systems. As a Staff Platform Engineer, you will play a critical role...  ...role. You will own reliability for major platform domains, design...  .... Architect, implement, and manage highly available and scalable...  ...Development, Platform Engineering, or Site Reliability Engineering role,... 
    Senior

    Saviynt

    San Francisco, CA
    11 hours ago
  • $180k - $210k

    Location Remote US Employment Type Full time Location Type Remote Department Tech Engineering Compensation $180K - $210K • Offers Equity The base salary & equity offered for this position will depend on several factors, including location, experience, qualifications... 
    Senior
    Remote job
    Full time
    H1b
    Work at office
    Worldwide
    Visa sponsorship
    Flexible hours

    Twelve Labs

    San Francisco, CA
    3 days ago
  • $156.36k - $279.96k

     ...wait to meet you. WHAT YOU'LL DO Site Reliability Engineers (SREs) are responsible for keeping all...  ..., Kafka, Kubernetes, and more. As a Senior Site Reliability Engineer at Braze,...  ...critical for scaling operations, capacity management, reducing operational pain, and... 
    Senior
    Full time
    Part time
    Work at office
    Remote work
    Flexible hours
    Shift work

    Braze

    San Francisco, CA
    2 days ago

Do you want to receive more vacancies?

Subscribe and receive similar vacancies to Senior Manager, Site Reliability Engineering. Be the first to apply!