Sign up to access all features of our service.
  • Job search
  • Favorites
  • Create a CV
    New
  • Salaries
  • Subscriptions

Staff Site Reliability Engineer - Observability

$147k - $202k

Okta

Secure Every Identity, from AI to Human

Identity is the key to unlocking the potential of AI. Okta secures AI by building the trusted, neutral infrastructure that enables organizations to safely embrace this new era. This work requires a relentless drive to solve complex challenges with real-world stakes. We are looking for builders and owners who operate with speed and urgency and execute with excellence.

This is an opportunity to do career-defining work. We're all in on this mission. If you are too, let's talk.

Position Overview:

We are seeking a highly technical Staff Observability Site Reliability Engineer with a specialty in Splunk to own and evolve our Splunk ecosystem. In this role, you will move beyond simple monitoring to delivering a world class, comprehensive, scalable Observability Platform that enables our SRE teams and business partners. You will treat infrastructure as code —utilizing Terraform and strong coding proficiency in Go, Python, or Ruby —to automate the deployment of agents and collectors across complex distributed systems.

Key Responsibilities

  • Automated Infrastructure: Design, build, and maintain scalable observability infrastructure using tools like Terraform.
  • Splunk Engineering: Optimize the collection, processing, and storage of log data to ensure high reliability and low latency of our Splunk services
  • Incident Response: Participate in on-call rotations and lead post-incident reviews to drive systemic improvements and "observability-driven development."
  • Automation: Eliminate "toil" by automating the deployment and scaling of observability agents and collectors.

Required Skills & Experience (The Essentials)

Log Management: Minimum 5+ Experience scaling and managing Splunk Cloud at scale (1000+ SVCs), including Workload Management (WLM) and HEC optimization. Visualization: Expertise in creating intuitive, actionable Splunk dashboards that correlate data across multiple sources.
SRE Mindset: Minimum 5+ years of experience in an SRE, DevOps, or Systems Engineering role with a focus on high-availability systems.

  • Programming Proficiency: Strong coding skills in SPL , Go for building internal tools and automating workflows.
  • Distributed Systems: Deep understanding of Linux internals, networking (TCP/IP, DNS, Load Balancing), and container orchestration (Kubernetes/EKS).
  • Problem Solving: A data-driven approach to debugging complex, cross-service performance bottlenecks.

Bonus Skills (The "Nice-to-Haves")

  • Telemetry Standards: Hands-on experience with OpenTelemetry (OTel), Vector, or similar frameworks for instrumenting applications.
  • Charge-back app: Experience in implementing Splunk charge-back app for usage reporting 

Cloud Platforms: Experience managing observability native tools within AWS or GCP.

Additional requirements:

  • This position requires the ability to access federal environments and/or have access to protected federal data.  As a condition of employment for this position, the successful candidate must be able to submit documentation establishing U.S. Person status (e.g. a U.S. Citizen, National, Lawful Permanent Resident, Refugee, or Asylee. 22 CFR 120.15) upon hire.
  • This person must attend in person onboarding in our San Francisco office the first week of employment. 

#LI-MM

#LI-Hybrid
P14596_3372199

Below is the annual base salary range for candidates located in California (excluding San Francisco Bay Area), Colorado, Illinois, New York and Washington. Your actual base salary will depend on factors such as your skills, qualifications, experience, and work location. In addition, Okta offers equity (where applicable), bonus, and benefits, including health, dental and vision insurance, 401(k), flexible spending account, and paid leave (including PTO and parental leave) in accordance with our applicable plans and policies. To learn more about our Total Rewards program please visit: .   

The annual base salary range for this position for candidates located in California (excluding San Francisco Bay Area), Colorado, Illinois, New York, and Washington is between: $147,000—$202,000 USD


The Okta Experience

We are intentional about connection. Our global community, spanning over 20 offices worldwide, is united by a drive to innovate. Your journey begins with an immersive, in-person onboarding experience designed to accelerate your impact and connect you to our mission and team from day one.

Okta is an Equal Opportunity Employer. All qualified applicants will receive consideration for employment without regard to race, color, religion, sex, sexual orientation, gender identity, national origin, ancestry, marital status, age, physical or mental disability, or status as a protected veteran. We also consider for employment qualified applicants with arrest and convictions records, consistent with applicable laws.

If reasonable accommodation is needed to complete any part of the job application, interview process, or onboarding please  use this Form to request an accommodation.

Notice for New York City Applicants & Employees: Okta may use Automated Employment Decision Tools (AEDT), as defined by New York City Local Law 144, that use artificial intelligence, machine learning, or other automated processes to assist in our recruitment and hiring process. In accordance with NYC Local Law 144, if you are an applicant or employee residing in New York City, please  click here to view our full NYC AEDT Notice.

Okta is committed to complying with applicable data privacy and security laws and regulations. For more information, please see our Personnel and Job Candidate Privacy Notice at  .
Vacancy posted a month ago
Similar jobs that could be interesting for youBased on the Staff Site Reliability Engineer - Observability in San Francisco, CA vacancy
  • Fieldguide is seeking a Senior Site Reliability Engineer to ensure the reliability and scalability of our production systems in San Francisco...  ...teams to define reliability standards and build robust observability practices. Candidates should have at least 5 years of experience... 
    Suggested
    Remote job
    Flexible hours

    Fieldguide

    San Francisco, CA
    3 days ago
  •  ...home day is currently Tuesday. Engineering at Lambda is responsible for...  ...’ll Do Deploy and operate observability platforms for logging,...  ...adoptable and improve product reliability. Lead members of other engineering...  ...5+ years of experience in Site Reliability Engineering... 
    Suggested
    Work at office
    Local area
    Work from home

    Lambda

    San Francisco, CA
    4 days ago
  • $230k - $310k

    A tech company is seeking an experienced Site Reliability Engineer to ensure the reliability and performance of its production systems across AWS infrastructure. You will build observability tools, lead incident responses, and collaborate on architectural improvements.... 
    Suggested

    Gamma

    San Francisco, CA
    1 day ago
  • $175k - $250k

    I did my part and supported the Regular Toilet is seeking a Site Reliability Engineer to enhance the reliability and performance of our systems at WorkOS. As a key member of the SRE team, you will handle critical responsibilities like improving incident responses and collaborating... 
    Suggested
    Remote job
    Flexible hours

    I did my part and supported the Regular Toilet

    San Francisco, CA
    23 hours ago
  • $163k - $203k

     ...on the SRE team, responsible for the reliability, scalability, and security of Prosper’...  ...portfolio. This is as much of a platform engineering role as it is SRE role — you will...  ...layer reliability, CI/CD pipelines, and observability while simultaneously building the skills... 
    Suggested
    Work experience placement
    Work at office
    Local area
    Remote work
    Flexible hours
    2 days per week

    Prosper.com

    San Francisco, CA
    4 days ago
  •  ...The TeamPlatform Engineering is the department within SRE that is responsible for a range...  ...edge and internal service mesh), and observability and alerting systems.The Fleet Management...  ...components that ensure cluster reliability and security (e.g., CoreDNS, cert-manager... 
    Work at office
    Local area
    Remote work
    Worldwide
    Flexible hours

    MongoDB

    San Francisco, CA
    4 days ago
  • $140k - $205k

     ...Senior Technology Site Reliability Engineer Cooley is seeking a Senior Site Reliability Engineer to join the Infrastructure & Development...  ...engineering to build and maintain automated, resilient, and observable systems that support high availability and operational excellence... 
    Full time
    Temporary work
    Work at office
    Flexible hours
    Weekend work

    Cooley

    San Francisco, CA
    4 days ago
  •  ...Job Description Job Description Forhyre is looking for engineers who can bring unique perspectives and innovative ideas to...  ...cloud best practices while building a culture of reliability and observability Engage in and improve the end to end lifecycle of software... 

    Forhyre

    San Francisco, CA
    14 days ago
  • $150k

     ...Description About The Role We are seeking an experienced Site Reliability Engineer (SRE) with a strong focus on DevSecOps to join our...  ...and external APIs; implement alerting and dashboards using observability tooling (e.g., CloudWatch, Datadog, Grafana). ~ Lead... 

    VantageScore

    San Francisco, CA
    18 days ago
  •  ...fast-growing, early-stage startup to identify a top-tier Site Reliability Engineer who will play a critical role in scaling and...  ...C, or Rust Deep understanding of system performance, observability, and debugging techniques Experience identifying and resolving... 

    Velia multiservices

    San Francisco, CA
    14 days ago
  • $210.6k - $305.1k

     ...Networking, Security, Collaboration, and Observability portfolios Your Impact As part...  ...~ You have led a distributed team of 5+ engineers, can demonstrate strong technical vision...  ...insurance. Please see the Cisco careers site to discover more benefits and perks. Employees... 
    Full time
    Temporary work
    Local area
    Flexible hours

    Cisco

    San Francisco, CA
    4 days ago
  • $227.2k - $324.5k

     ...About the Role: Site Reliability Engineering (SRE) at Tubi is not a traditional operations team. We are a software engineering organization...  ...the multi-year technical strategy and vision for Tubi's observability, and automation platforms. Partner with infra lead to... 
    Full time
    Contract work
    Temporary work
    Local area
    Flexible hours

    Tubi

    San Francisco, CA
    4 days ago
  • $238k - $290k

     ...we're just getting started. Role Overview As a Staff Software Engineer on the Site Reliability team at Harvey, you will ensure the reliability,...  ...Terraform, CloudFormation, etc.) ~ Deep familiarity with observability tools (Datadog, Sentry, etc.) and incident response... 
    Relocation package

    Harvey

    San Francisco, CA
    23 hours ago
  •  ...candidates that will contribute to the diversification and enrichment of ideas and perspectives at AHEAD.  AHEAD’s Sr. Observability Solutions Engineers are the technical experts that collaborate with our AHEAD account teams to help identify, qualify, and build solutions... 
    Work at office

    AHEAD

    San Francisco, CA
    8 days ago
  • $125k - $165k

    Position: Site Reliability Engineer Location: San Francisco, CA Job Id: 434 # of Openings: 1 TELCOR Inc, a leading innovator in laboratory...  ...operations? Do you focus on uptime, resilience, observability and incident response? This may be the role for you! Along... 
    Temporary work
    Work at office
    Visa sponsorship
    Work visa
    Relocation package
    Flexible hours

    TELCOR

    San Francisco, CA
    3 days ago
  •  ...product, you will find a home at Fieldguide. About the Role As a Senior Site Reliability Engineer (SRE) at Fieldguide, you will be responsible for ensuring the reliability, scalability, and observability of our production systems. You will apply software engineering... 
    Remote work
    Work from home
    Flexible hours

    Fieldguide

    San Francisco, CA
    3 days ago
  • The role We're looking for a world-class Site Reliability Engineer to ensure the reliability, performance, and scalability of our AI infrastructure...  ...-heavy. You’ll own our reliability posture end-to-end—observability, performance tuning, incident ops, infrastructure health,... 

    Blaxel

    San Francisco, CA
    1 day ago
  • $163k - $203k

     ...on the SRE team, responsible for the reliability, scalability, and security of Prosper’...  ...portfolio. This is as much of a platform engineering role as it is SRE role — you will...  ...layer reliability, CI/CD pipelines, and observability while simultaneously building the skills... 
    Work experience placement
    Work at office
    Local area
    Remote work
    Flexible hours
    2 days per week

    Prosper

    San Francisco, CA
    1 day ago
  •  ...and onboard services and teams to the reliability tenets. Establish and maintain...  ...development teams to build resilient, observable, fault‑tolerant, recoverable, and scalable...  .... 6+ years of experience in Site Reliability Engineering, managing infrastructure and services... 

    OutSystems, Inc.

    San Francisco, CA
    23 hours ago
  • US Corp. is seeking a Lead Site Reliability Engineer to spearhead our mission of delivering highly available and performant systems. With an...  ...identifying bottlenecks, and implementing robust monitoring and observability solutions using Prometheus and Grafana. As a technical... 

    Axiom Pursuits

    San Francisco, CA
    13 hours ago
  • # Senior Site Reliability EngineerHybrid - San Francisco**Our Mission &...  ...operates as both a central engineering function and an embedded reliability...  ...engineering leads and staff engineers to define SLOs...  ...artifacts - SLO templates, observability checklists, alerting... 
    Work at office
    Immediate start
    Worldwide
    Monday to Friday
    Flexible hours

    Careers at Drata

    San Francisco, CA
    1 day ago
  • $60 per hour

    Senior Site Reliability Engineer (Copy) Seattle Hybrid (Hybrid location). Full-time. About Us Supio is a trusted AI platform purpose-built for...  ...and hotfix coordination. Build safe, repeatable, and observable workflows. GitHub Operations: Manage GitHub branching strategies... 
    Full time
    Work at office
    Flexible hours

    Bonfirevc

    San Francisco, CA
    23 hours ago
  • $175k - $250k

     ...fast‑growing customer base of SaaS companies. About the Site Reliability Engineering Team The Site Reliability Engineering (SRE) team ensures...  ...uncover hidden failure modes Care deeply about uptime, observability, and performance, placing reliability as a product... 
    Remote work

    I did my part and supported the Regular Toilet

    San Francisco, CA
    23 hours ago
  • What you’ll do As a Senior Site Reliability Engineer, you’ll work closely with product teams in Spend to deliver and maintain scalable, reliable...  ..., and operational readiness. Lead incident response, observability, and automation across critical systems. Own team-level... 

    Airwallex-

    San Francisco, CA
    4 days ago
  • $166.9k - $225.9k

     ...operates as both a central engineering function and an embedded reliability practice. You'll be part...  ...engineering leads and staff engineers to define SLOs...  ...—SLO templates, observability checklists, alerting standards...  ...years of experience in Site Reliability Engineering,... 
    Flexible hours

    Drata

    San Francisco, CA
    1 day ago
  •  ...poised to redefine computing. About the Role We're seeking a Site Reliability Engineer to ensure Hyperbolic's GPU marketplace and AI...  ...flags, and automated rollback mechanisms Proficient in observability tools and practices including metrics, logging, tracing,... 

    deCircle

    San Francisco, CA
    4 days ago
  • $190k - $290k

     ...choice. At Adyen, everything we do is engineered for ambition. For our teams, we...  ...ambitions faster. Customer Developer Observability Team We believe that our customers...  ...and then being able to shift to highly reliable systems Building and maintaining a... 
    H1b
    Work at office
    Visa sponsorship
    Flexible hours
    Shift work

    Adyen

    San Francisco, CA
    4 days ago
  • $125k - $165k

    Position Site Reliability Engineer Location Lincoln, NE, San Francisco, CA, or Remote Job ID 434 Openings 1 Job Summary The Site Reliability...  ...2+ years of experience with Terraform Experience with observability Insurance & 401(k) Group insurance package covering... 
    Temporary work
    Remote work
    Visa sponsorship
    Work visa
    Flexible hours

    TELCOR Inc

    San Francisco, CA
    3 days ago
  •  ...About the role We’re hiring an SRE to join our engineering team at Plenful and take ownership of the reliability and performance of the systems that power our product...  ...across the team. What you’ll do Reliability, Observability and Performance: Maintain and evolve alerting... 
    Work at office
    Remote work
    Flexible hours
    2 days per week

    Plenful

    San Francisco, CA
    1 day ago
  •  ...Connor was a machine learning research engineer at Scale AI. The rest of our team comes...  ...Senior SRE, you'll tackle the scaling and reliability challenges that come with adding...  ...services, and building the automation and observability that keep Unify fast and reliable at scale... 

    Unify

    San Francisco, CA
    23 hours ago

Do you want to receive more vacancies?

Subscribe and receive similar vacancies to Staff Site Reliability Engineer - Observability. Be the first to apply!