Sign up to access all features of our service.
  • Job search
  • Favorites
  • Create a CV
    New
  • Salaries
  • Subscriptions

Staff Site Reliability Engineer - Observability

$147k - $202k

Okta

Secure Every Identity, from AI to Human

Identity is the key to unlocking the potential of AI. Okta secures AI by building the trusted, neutral infrastructure that enables organizations to safely embrace this new era. This work requires a relentless drive to solve complex challenges with real-world stakes. We are looking for builders and owners who operate with speed and urgency and execute with excellence.

This is an opportunity to do career-defining work. We're all in on this mission. If you are too, let's talk.

Position Overview:

We are seeking a highly technical Staff Observability Site Reliability Engineer with a specialty in Splunk to own and evolve our Splunk ecosystem. In this role, you will move beyond simple monitoring to delivering a world class, comprehensive, scalable Observability Platform that enables our SRE teams and business partners. You will treat infrastructure as code —utilizing Terraform and strong coding proficiency in Go, Python, or Ruby —to automate the deployment of agents and collectors across complex distributed systems.

Key Responsibilities

  • Automated Infrastructure: Design, build, and maintain scalable observability infrastructure using tools like Terraform.
  • Splunk Engineering: Optimize the collection, processing, and storage of log data to ensure high reliability and low latency of our Splunk services
  • Incident Response: Participate in on-call rotations and lead post-incident reviews to drive systemic improvements and "observability-driven development."
  • Automation: Eliminate "toil" by automating the deployment and scaling of observability agents and collectors.

Required Skills & Experience (The Essentials)

Log Management: Minimum 5+ Experience scaling and managing Splunk Cloud at scale (1000+ SVCs), including Workload Management (WLM) and HEC optimization. Visualization: Expertise in creating intuitive, actionable Splunk dashboards that correlate data across multiple sources.
SRE Mindset: Minimum 5+ years of experience in an SRE, DevOps, or Systems Engineering role with a focus on high-availability systems.

  • Programming Proficiency: Strong coding skills in SPL , Go for building internal tools and automating workflows.
  • Distributed Systems: Deep understanding of Linux internals, networking (TCP/IP, DNS, Load Balancing), and container orchestration (Kubernetes/EKS).
  • Problem Solving: A data-driven approach to debugging complex, cross-service performance bottlenecks.

Bonus Skills (The "Nice-to-Haves")

  • Telemetry Standards: Hands-on experience with OpenTelemetry (OTel), Vector, or similar frameworks for instrumenting applications.
  • Charge-back app: Experience in implementing Splunk charge-back app for usage reporting 

Cloud Platforms: Experience managing observability native tools within AWS or GCP.

Additional requirements:

  • This position requires the ability to access federal environments and/or have access to protected federal data.  As a condition of employment for this position, the successful candidate must be able to submit documentation establishing U.S. Person status (e.g. a U.S. Citizen, National, Lawful Permanent Resident, Refugee, or Asylee. 22 CFR 120.15) upon hire.
  • This person must attend in person onboarding in our San Francisco office the first week of employment. 

#LI-MM

#LI-Hybrid
P14596_3372199

Below is the annual base salary range for candidates located in California (excluding San Francisco Bay Area), Colorado, Illinois, New York and Washington. Your actual base salary will depend on factors such as your skills, qualifications, experience, and work location. In addition, Okta offers equity (where applicable), bonus, and benefits, including health, dental and vision insurance, 401(k), flexible spending account, and paid leave (including PTO and parental leave) in accordance with our applicable plans and policies. To learn more about our Total Rewards program please visit: .   

The annual base salary range for this position for candidates located in California (excluding San Francisco Bay Area), Colorado, Illinois, New York, and Washington is between: $147,000—$202,000 USD


The Okta Experience

We are intentional about connection. Our global community, spanning over 20 offices worldwide, is united by a drive to innovate. Your journey begins with an immersive, in-person onboarding experience designed to accelerate your impact and connect you to our mission and team from day one.

Okta is an Equal Opportunity Employer. All qualified applicants will receive consideration for employment without regard to race, color, religion, sex, sexual orientation, gender identity, national origin, ancestry, marital status, age, physical or mental disability, or status as a protected veteran. We also consider for employment qualified applicants with arrest and convictions records, consistent with applicable laws.

If reasonable accommodation is needed to complete any part of the job application, interview process, or onboarding please  use this Form to request an accommodation.

Notice for New York City Applicants & Employees: Okta may use Automated Employment Decision Tools (AEDT), as defined by New York City Local Law 144, that use artificial intelligence, machine learning, or other automated processes to assist in our recruitment and hiring process. In accordance with NYC Local Law 144, if you are an applicant or employee residing in New York City, please  click here to view our full NYC AEDT Notice.

Okta is committed to complying with applicable data privacy and security laws and regulations. For more information, please see our Personnel and Job Candidate Privacy Notice at  .
Vacancy posted more than 2 months ago
Similar jobs that could be interesting for youBased on the Staff Site Reliability Engineer - Observability in San Francisco, CA vacancy
  • A dynamic tech firm located in San Francisco is seeking a Site Reliability Engineer to enhance operational health across their production systems. This high-impact role demands expertise in AWS and strong programming skills. You will manage production systems' reliability... 
    Suggested

    gamma.app

    San Francisco, CA
    2 days ago
  • $194k - $267k

     ...career-defining work. We're all in on this mission. If you are too, let's talk. We are seeking a highly technical Observability Site Reliability Engineer with a specialty in Google Cloud, to own and expand our Observability ecosystem into GCP. In this role, you will... 
    Suggested
    Permanent employment
    Local area
    Worldwide
    Flexible hours

    Okta

    San Francisco, CA
    16 days ago
  •  ...About the role We’re hiring an SRE to join our engineering team at Plenful and take ownership of the reliability and performance of the systems that power our...  ...across the team. What you’ll do Reliability, Observability and Performance: Maintain and evolve... 
    Suggested
    Work at office
    Remote work
    Flexible hours
    2 days per week

    Plenful

    San Francisco, CA
    2 days ago
  •  ...continue to scale. About the role We're hiring a Site Reliability Engineer (SRE) to ensure the reliability, performance, and scalability...  ...and automation to reduce MTTR (Mean Time to Recovery) Observability & System Insight Design and evolve observability... 
    Suggested
    Work at office
    Remote work
    Flexible hours
    2 days per week

    Plenful

    San Francisco, CA
    3 days ago
  • $150k

     ...Site Reliability Engineer San Francisco, CA About The Role We are seeking an experienced Site Reliability Engineer (SRE) with a strong...  ...external APIs; implement alerting and dashboards using observability tooling (e.g., CloudWatch, Datadog, Grafana). Lead... 
    Suggested

    VantageScore®

    San Francisco, CA
    4 days ago
  • $98.58k - $138.02k

     ...Site Reliability Engineer II Restaurant365 is a SaaS company disrupting the restaurant industry! Our cloud-based platform provides a unique...  ...and evolve monitoring tools and platforms to improve observability. Promote and apply best practices for reliability, scalability... 
    Work at office

    Restaurant365

    San Francisco, CA
    2 days ago
  • $170k - $250k

     ...Site Reliability Engineer (SRE) Location: San Francisco, CA / Palo Alto, CA Company Stage of Funding: Growth-Stage AI Infrastructure...  ...heavily in reliability engineering to build the automation, observability, and platform infrastructure that powers their multi-... 
    Work at office
    Visa sponsorship
    Flexible hours

    Recruiting from Scratch

    San Francisco, CA
    17 hours ago
  •  ...Udaip Cloud-Based Data And Ai Platform Engineer At U.S. Bank, we're on a journey to do our best. Helping the customers and businesses...  ..., Kubernetes, Docker Containers, and Splunk Logging & Observability Experience with Linux containerization and the Docker ecosystem... 
    Temporary work
    Work experience placement

    Phenom People

    San Francisco, CA
    3 days ago
  •  ...significantly outperforms individual engineers. We combine language models with human...  ...Role: We are seeking an experienced Site Reliability Engineer to join our Platform...  ...comprehensive monitoring, alerting, and observability solutions using Datadog and custom instrumentation... 

    CodeRabbit

    San Francisco, CA
    17 hours ago
  • $106k - $130k

     ...generation of application infrastructure and to be responsible for reliability, automation and scalability using and the latest best...  ...and disaster recovery scenarios. Implement and evangelize Observability and monitoring systems to proactively detect problems and identify... 
    Hourly pay
    Work experience placement
    Work at office
    Immediate start
    Visa sponsorship
    Work visa
    Flexible hours

    Early Warning Services

    San Francisco, CA
    2 days ago
  •  ...redefine computing. About the Role We're seeking a Site Reliability Engineer to ensure Hyperbolic's GPU marketplace and AI...  ...flags, and automated rollback mechanisms Proficient in observability tools and practices including metrics, logging, tracing,... 

    Hyperbolic Labs

    San Francisco, CA
    3 days ago
  • $170k - $230k

     ...Site Reliability Engineer (SRE) Palo Alto / San Francisco Bay Area About Mithril Mithril is an AI infrastructure platform built to...  ...keep the lights on' role — you will build the automation, observability, and tooling that allows Mithril to coordinate advanced compute... 
    Work at office
    Local area
    1 day per week

    Mithril

    San Francisco, CA
    3 days ago
  • $255k - $405k

     ...Join the engineering teams that bring OpenAI's ideas safely to the world!! The Applied Engineering...  .... About the Role We're building the observability product for OpenAI—from scalable...  ...tools to make OpenAI's production systems reliable, performant, and observable. What You'... 

    Dormont Manufacturing Company

    San Francisco, CA
    6 hours ago
  • $181.1k - $272.1k

     ...build products our customers love. We're a fast moving, highly skilled team designing and building a suite of observability services that help Apple engineers get insights into their systems. If the thought of working with petabytes of data interests you, this is the place... 
    Work experience placement
    Relocation

    Apple

    San Francisco, CA
    1 day ago
  • $205k - $305k

     ...changing Stellar ecosystem. SDF is looking for a Director of Site Reliability Engineering to lead a small, high-leverage SRE team and help shape how...  ...services that help SDF engineering teams build, deploy, observe, and operate software with confidence. Engineering... 
    Temporary work
    Work at office
    Local area
    Remote work
    Worldwide
    Flexible hours

    Stellar

    San Francisco, CA
    17 hours ago
  • Somi AI in San Francisco is looking for a Software Engineer to join our Insights team. You will design and implement solutions that enhance database observability across our systems, collaborating with various teams to ensure performance metrics are effectively reported... 

    Somi AI

    San Francisco, CA
    1 day ago
  • Zyphra in San Francisco is hiring a Platform Engineer responsible for designing and maintaining robust infrastructure. You will collaborate with teams to enhance system observability, manage cloud environments and ensure deployment safety. The ideal candidate has strong... 

    Zyphra

    San Francisco, CA
    1 day ago
  • $230k - $300k

     ...candidates that will contribute to the diversification and enrichment of ideas and perspectives at AHEAD. AHEAD’s Sr. Observability Solutions Engineers are the technical experts that collaborate with our AHEAD account teams to help identify, qualify, and build solutions... 

    Ahead

    San Francisco, CA
    5 days ago
  • A leading observability company is seeking a Solution Engineer in San Francisco. You will support the sales team by providing technical expertise, managing POCs, and demonstrating product value to customers. The ideal candidate has 5+ years in observability and excels... 
    Remote job

    Dynatrace

    San Francisco, CA
    4 days ago
  • Sr. Site Reliability Engineer Job type: Full Time · Department: Platform · Work type: On-Site San Francisco...  ..., while providing the control and observability needed to scale safely. Built for...  ...years in a senior SRE, platform, or staff infrastructure role Deep Kubernetes... 
    Full time
    Remote work

    Neara

    San Francisco, CA
    5 days ago
  • $60 per hour

    Senior Site Reliability Engineer (Copy) Seattle Hybrid (Hybrid location). Full-time. About Us Supio is a trusted AI platform purpose-built for...  ...and hotfix coordination. Build safe, repeatable, and observable workflows. GitHub Operations: Manage GitHub branching strategies... 
    Full time
    Work at office
    Flexible hours

    Bonfirevc

    San Francisco, CA
    4 days ago
  • # Senior Site Reliability EngineerHybrid - San Francisco**Our Mission &...  ...operates as both a central engineering function and an embedded reliability...  ...engineering leads and staff engineers to define SLOs...  ...artifacts - SLO templates, observability checklists, alerting... 
    Work at office
    Immediate start
    Worldwide
    Monday to Friday
    Flexible hours

    Careers at Drata

    San Francisco, CA
    5 days ago
  • $230k - $300k

    AHEAD is seeking a Sr. Observability Solutions Engineer to support customer digital transformation by designing complex solutions utilizing technologies like Datadog and Splunk. The role requires expert knowledge and experience in observability technologies, with a focus... 

    Ahead

    San Francisco, CA
    5 days ago
  •  ...apply. The Role As a Senior Platform Engineer, you are a champion for DevOps and SRE...  ...You Will Be Doing Improving production reliability and system resilience within an SRE scoped...  ...Experience operating a production observability stack (metrics, logs, traces), with an... 
    Flexible hours

    Megaport

    Brisbane, CA
    4 days ago
  •  ...Role We're looking for an Infrastructure Engineer to take the lead on scaling our...  ...as we grow. You’ll own the stability, observability, and debugging workflows that keep our...  ...high-trust role where you’ll shape how reliability is done - reducing incident load, building... 
    Worldwide
    Shift work

    Happyrobot Inc.

    San Francisco, CA
    4 days ago
  • We are seeking a Sr. Site Reliability Engineer to join our team and run critical infrastructure for our blockchain and web applications. You...  ...diverse set of experiences. They enjoy building testing and observability capabilities that will accelerate the development sped... 
    Remote job

    Blockchain Works

    San Francisco, CA
    21 days ago
  • $166.9k - $225.9k

     ...operates as both a central engineering function and an embedded reliability practice. You'll be part...  ...engineering leads and staff engineers to define SLOs...  ...—SLO templates, observability checklists, alerting standards...  ...years of experience in Site Reliability Engineering,... 
    Flexible hours

    Drata

    San Francisco, CA
    5 days ago
  • $154k - $190k

    Dynatrace LLC is seeking a Solution Engineer to support the sales team by providing technical expertise on Advanced Observability. This role involves executing demos, managing POCs, and collaborating across teams. Candidates should have 5+ years of experience in observability... 
    Remote job

    Dynatrace LLC

    San Francisco, CA
    4 days ago
  • $140k - $220k

    About the Job You’ll own reliability and operational excellence for Pylon's production systems...  ...build tooling that makes the entire engineering team more effective, establish on-call...  ...implement monitoring, alerting, and observability systems from first principles... 

    Pylon

    San Francisco, CA
    2 days ago
  • US Corp. is seeking a Lead Site Reliability Engineer to spearhead our mission of delivering highly available and performant systems. With an...  ...identifying bottlenecks, and implementing robust monitoring and observability solutions using Prometheus and Grafana. As a technical... 

    Axiom Pursuits

    San Francisco, CA
    4 days ago

Do you want to receive more vacancies?

Subscribe and receive similar vacancies to Staff Site Reliability Engineer - Observability. Be the first to apply!