Sign up to access all features of our service.
  • Job search
  • Favorites
  • Create a CV
    New
  • Salaries
  • Subscriptions

Staff Site Reliability Engineer - Observability

$147k - $202k

Okta

Secure Every Identity, from AI to Human

Identity is the key to unlocking the potential of AI. Okta secures AI by building the trusted, neutral infrastructure that enables organizations to safely embrace this new era. This work requires a relentless drive to solve complex challenges with real-world stakes. We are looking for builders and owners who operate with speed and urgency and execute with excellence.

This is an opportunity to do career-defining work. We're all in on this mission. If you are too, let's talk.

Position Overview:

We are seeking a highly technical Staff Observability Site Reliability Engineer with a specialty in Splunk to own and evolve our Splunk ecosystem. In this role, you will move beyond simple monitoring to delivering a world class, comprehensive, scalable Observability Platform that enables our SRE teams and business partners. You will treat infrastructure as code —utilizing Terraform and strong coding proficiency in Go, Python, or Ruby —to automate the deployment of agents and collectors across complex distributed systems.

Key Responsibilities

  • Automated Infrastructure: Design, build, and maintain scalable observability infrastructure using tools like Terraform.
  • Splunk Engineering: Optimize the collection, processing, and storage of log data to ensure high reliability and low latency of our Splunk services
  • Incident Response: Participate in on-call rotations and lead post-incident reviews to drive systemic improvements and "observability-driven development."
  • Automation: Eliminate "toil" by automating the deployment and scaling of observability agents and collectors.

Required Skills & Experience (The Essentials)

Log Management: Minimum 5+ Experience scaling and managing Splunk Cloud at scale (1000+ SVCs), including Workload Management (WLM) and HEC optimization. Visualization: Expertise in creating intuitive, actionable Splunk dashboards that correlate data across multiple sources.
SRE Mindset: Minimum 5+ years of experience in an SRE, DevOps, or Systems Engineering role with a focus on high-availability systems.

  • Programming Proficiency: Strong coding skills in SPL , Go for building internal tools and automating workflows.
  • Distributed Systems: Deep understanding of Linux internals, networking (TCP/IP, DNS, Load Balancing), and container orchestration (Kubernetes/EKS).
  • Problem Solving: A data-driven approach to debugging complex, cross-service performance bottlenecks.

Bonus Skills (The "Nice-to-Haves")

  • Telemetry Standards: Hands-on experience with OpenTelemetry (OTel), Vector, or similar frameworks for instrumenting applications.
  • Charge-back app: Experience in implementing Splunk charge-back app for usage reporting 

Cloud Platforms: Experience managing observability native tools within AWS or GCP.

Additional requirements:

  • This position requires the ability to access federal environments and/or have access to protected federal data.  As a condition of employment for this position, the successful candidate must be able to submit documentation establishing U.S. Person status (e.g. a U.S. Citizen, National, Lawful Permanent Resident, Refugee, or Asylee. 22 CFR 120.15) upon hire.
  • This person must attend in person onboarding in our San Francisco office the first week of employment. 

#LI-MM

#LI-Hybrid
P14596_3372199

Below is the annual base salary range for candidates located in California (excluding San Francisco Bay Area), Colorado, Illinois, New York and Washington. Your actual base salary will depend on factors such as your skills, qualifications, experience, and work location. In addition, Okta offers equity (where applicable), bonus, and benefits, including health, dental and vision insurance, 401(k), flexible spending account, and paid leave (including PTO and parental leave) in accordance with our applicable plans and policies. To learn more about our Total Rewards program please visit: .   

The annual base salary range for this position for candidates located in California (excluding San Francisco Bay Area), Colorado, Illinois, New York, and Washington is between: $147,000—$202,000 USD


The Okta Experience

We are intentional about connection. Our global community, spanning over 20 offices worldwide, is united by a drive to innovate. Your journey begins with an immersive, in-person onboarding experience designed to accelerate your impact and connect you to our mission and team from day one.

Okta is an Equal Opportunity Employer. All qualified applicants will receive consideration for employment without regard to race, color, religion, sex, sexual orientation, gender identity, national origin, ancestry, marital status, age, physical or mental disability, or status as a protected veteran. We also consider for employment qualified applicants with arrest and convictions records, consistent with applicable laws.

If reasonable accommodation is needed to complete any part of the job application, interview process, or onboarding please  use this Form to request an accommodation.

Notice for New York City Applicants & Employees: Okta may use Automated Employment Decision Tools (AEDT), as defined by New York City Local Law 144, that use artificial intelligence, machine learning, or other automated processes to assist in our recruitment and hiring process. In accordance with NYC Local Law 144, if you are an applicant or employee residing in New York City, please  click here to view our full NYC AEDT Notice.

Okta is committed to complying with applicable data privacy and security laws and regulations. For more information, please see our Personnel and Job Candidate Privacy Notice at  .
Vacancy posted a month ago
Similar jobs that could be interesting for youBased on the Staff Site Reliability Engineer - Observability in New York, NY vacancy
  • jobr.pro is seeking a Senior Site Reliability Engineer in New York, NY, to enhance platform reliability and engineering excellence. You will be instrumental in implementing observability, security, and CI/CD practices. This role involves coaching teams and optimizing workflows... 
    Suggested

    jobr.pro

    New York, NY
    5 days ago
  •  ...Chainlink Reserve. Learn more at chain.link. The Observability Team enables Chainlink development and empowers engineers to continue building and supporting crucial...  ...a profound impact in the blockchain industry. Reliability is vital to the success of our company. As a Senior... 
    Suggested
    Remote work

    Framework Ventures

    Bogota, NJ
    2 days ago
  • $150k - $175k

    Fireblocks is seeking a Site Reliability Engineer to enhance the reliability of its digital asset custody platform. The role requires a minimum of 3 years of SRE experience, proficiency in Python, and experience with monitoring tools like DataDog. You will work closely... 
    Suggested

    Fireblocks

    New York, NY
    4 days ago
  • A cloud-native technology firm is seeking a Site Reliability Engineer to enhance the performance and reliability of its web services. The successful candidate will work cross-departmentally, driving best practices for monitoring and CI/CD pipelines while automating processes... 
    Suggested

    Weedmaps

    New York, NY
    2 days ago
  • $262k - $365k

    Senior Staff Uber Technical Lead, Observability Intelligence corporate_fare Google place New York, NY, USA...  ...Qualifications Master’s degree or PhD in Engineering, Computer Science, or a related...  ...Response Management, and Site Reliability Engineering teams across all Product... 
    Suggested
    Full time
    Shift work

    Google Inc.

    New York, NY
    5 days ago
  • $194k - $267k

     ...experiences.  Join our team! We’re building a world where Identity belongs to you. We are seeking a highly technical Observability Site Reliability Engineer with a specialty in Google Cloud, to own and expand our Observability ecosystem into GCP. In this role, you will... 
    Permanent employment
    Full time
    Work at office
    Local area
    Flexible hours

    Okta

    New York, NY
    more than 2 months ago
  • $185k - $227k

     ...purpose and we are hiring the world’s best engineers, scientists, designers, product...  ...details. ROLE AND RESPONSIBILITIES A Senior Site Reliability Engineer (SRE) is expected to own the...  ..., ECS, Cloud Run with service mesh, observability, and security best practices Implement... 
    Remote work

    JUUL Labs

    New York, NY
    8 hours ago
  • $150k - $200k

     ...almost as much opportunity ahead of us. We’re seeking a Sr. Site Reliability Engineer to join our team! About the Role We are seeking a Senior...  .... This role focuses on building automation, maintaining observability, and supporting incident response to keep customer‑facing... 
    Full time
    Remote work
    Flexible hours

    Backblaze

    New York, NY
    8 hours ago
  •  ...Job Description Job Description Forhyre is looking for engineers who can bring unique perspectives and innovative ideas to...  ...cloud best practices while building a culture of reliability and observability Engage in and improve the end to end lifecycle of software... 

    forhyre.com

    New York, NY
    4 days ago
  • $175k - $190k

     ...a partner company. We are currently looking for a Senior Site Reliability Engineer - AWS in United States. This role sits at the core of a fast...  ...will play a key role in strengthening CI/CD pipelines, observability, and incident response practices. This is a highly... 
    Full time
    Temporary work

    Jobgether

    New York, NY
    8 hours ago
  •  ...We’re on the lookout for a Site Reliability Engineer ! 45-65K EUR | Full Remote (Latam) | Series A startup backed by top US VCs. At Agentero...  ...on‑call model across our distributed team. What You’ll Do Observability & Monitoring — You will design and implement monitoring... 
    Remote work
    Home office
    Night shift

    Agentero

    New York, NY
    8 hours ago
  •  ...enhancing security and fighting fraud. We are seeking a Senior Site Reliability Engineer (Senior SRE) to drive reliability improvements across our...  ...in building scalable infrastructure patterns, advancing observability, improving incident response, and partnering with... 
    Remote work
    Flexible hours
    Night shift

    CertifID LLC

    New York, NY
    3 days ago
  • $150k - $200k

     ...management. About the Role We are looking for a seasoned Senior Site Reliability Engineer to join our dynamic team in a foundational role, owning...  ...the entire stack. Implement monitoring, alerting, and observability solutions to detect and prevent issues proactively.... 
    Work experience placement
    Remote work

    Barti

    New York, NY
    8 hours ago
  • $160k - $300k

     ...Overview Site Reliability Engineer (SRE) – Remote, Full‑time. Base pay $160K–$300K/year. Responsibilities Build and automate systems that ensure...  ..., and post‑mortem analysis. Design and maintain observability systems including metrics, logging, and alerting. Own reliability... 
    Full time
    Remote work

    Crossing Hurdles

    New York, NY
    8 hours ago
  •  ...services in a new public cloud platform? Join our IaaS Site Reliability Engineering (SRE) team. We design, develop, and operate infrastructure...  ...with Python, Terraform, and Ansible/Salt and lead observability initiatives (metrics, logging, tracing, SLOs) Modernizing... 
    Work at office
    Remote work

    Akamai

    New York, NY
    8 hours ago
  • $182.3k - $220k

     ...and that mission depends on reliable, secure, and scalable systems...  ...tools that empower our engineers to ship safely and confidently...  ...drive uptime, performance and observability – partnering closely with product...  ...year (i.e., during team on-sites).   At Ro, we believe... 
    Local area
    Flexible hours

    Ro

    New York, NY
    4 days ago
  •  ...the agility of a high-growth business with the backing of a global organization. As the Site Reliability Engineer, you will help ensure the reliability, scalability, and observability of CloudBlue’s multi-tenant SaaS platforms used by service providers worldwide. You... 
    Remote work
    Worldwide
    Flexible hours

    HostPapa

    New York, NY
    8 hours ago
  • $150k - $200k

     ...Join to apply for the Senior Site Reliability Engineer role at Gradle Inc. Develocity is a first‑of‑its‑kind toolchain observability and acceleration platform that helps software teams adopt and improve DORA capabilities (including continuous delivery) in order to achieve... 
    Full time
    Local area
    Remote work
    Work from home

    Gradle Inc.

    New York, NY
    8 hours ago
  • $148.32k - $185.4k

     ...excited about where we’re going. We’re looking for a senior Site Reliability Engineer to join our small, high-ownership SRE team. In this hands...  ...risk and increase deployment frequency. Own the Datadog observability platform, including dashboards, monitors, alerting... 
    Remote work
    Flexible hours

    AbsenceSoft

    New York, NY
    8 hours ago
  •  ...and Responsibilities of the Role As a Site Reliability Engineer, you will help build and support a...  ...platform while working closely with support staff and developers. You will be...  ...provisioning cloud resources Experience with observability tooling: Cloud Logging, Grafana /... 
    Full time
    For contractors
    Remote work
    Work from home
    Monday to Friday

    Manila Recruitment

    New York, NY
    8 hours ago
  •  ...such as public cloud, data science, AI, engineering innovation, and IoT. Our customers...  ...profitable, and growing. We are hiring a Site Reliability / Gitops Engineer to our Information...  ..., setting up, maintaining and using observability tools such as Prometheus, Grafana,... 
    Work at office
    Remote work
    Work from home
    Flexible hours

    Canonical Group Ltd

    New York, NY
    8 hours ago
  • $140k - $205k

     ...Senior Technology Site Reliability Engineer Cooley is seeking a Senior Site Reliability Engineer to join the Infrastructure & Development...  ...engineering to build and maintain automated, resilient, and observable systems that support high availability and operational excellence... 
    Full time
    Temporary work
    Work at office
    Flexible hours
    Weekend work

    Cooley

    New York, NY
    3 days ago
  •  ...York, United States | Posted on 11/13/2025 Title: Senior Site Reliability Engineer (SRE) Location: Remote AboutJanuary AtJanuary, we’re transforming...  ...You’ll architect resilient infrastructure, design modern observability solutions, and build sustainable on-call processes that... 
    Remote work

    Govserviceshub

    New York, NY
    2 days ago
  • $127k - $249k

     ...The Team Platform Engineering is the department within SRE that is responsible for a range...  ...edge and internal service mesh), and observability and alerting systems. The Fleet...  ...critical components that ensure cluster reliability and security (e.g., CoreDNS, cert-manager... 
    Work at office
    Local area
    Remote work
    Worldwide
    Flexible hours

    MongoDB

    New York, NY
    8 hours ago
  •  ...consider applicants based in LATAM. Our Engineering team is having a blast while...  ...maintaining Kraken's infrastructure. As a Site Reliability Engineer, you will keep one of the fastest...  ...Consul, Salt). Strong monitoring and observability experience (Grafana, VictoriaMetrics,... 
    Local area
    Remote work

    Framework Ventures

    New York, NY
    8 hours ago
  • $133.11k - $148.04k

     ...As a Site Reliability Engineer at Weedmaps you will work cross‑departmentally with your partners on the application, infrastructure and quality...  ...on Kubernetes in AWS’ public cloud. We also leverage observability, monitoring, CI/CD automation and custom tooling to push... 
    Full time
    Temporary work
    Local area
    Remote work
    Worldwide

    Weedmaps

    New York, NY
    1 day ago
  •  ...The Site Reliability Engineering (SRE) Manager is responsible for leading the reliability, performance, operational excellence, and cost efficiency...  .... Establish cost visibility, allocation, and reporting. Observability, Telemetry & Alerting Define and maintain observability... 

    ECi Software Solutions

    New York, NY
    8 hours ago
  •  ...the United States is seeking a Sr. Platform Engineer to manage AWS, GCP, and cloud...  ...In this role, you will plan monitoring and observability mechanisms, develop tooling in Rust, and ensure operations meet reliability standards. The ideal candidate has 5+ years... 
    Remote work
    Flexible hours

    3Box Labs

    New York, NY
    8 hours ago
  • $174k - $210k

     ...ushering in the future of health and wealth for all. Engineering Manager, Site Reliability Engineering Nayya Job Summary We are looking for...  ...and evolve standards for infrastructure as code, observability, CI/CD, incident management, and performance tuning.... 
    Work at office
    Immediate start
    Shift work

    Nayya

    New York, NY
    4 days ago
  • $136k - $180k

     ...As a Staff Site Reliability Engineer, you will be a key technical leader responsible for the architecture, reliability, and security of our entire...  ...securely. You will serve as the expert for scalability, observability, and building the robust, automated systems that power... 
    Remote work

    Kevala, Inc.

    New York, NY
    8 hours ago

Do you want to receive more vacancies?

Subscribe and receive similar vacancies to Staff Site Reliability Engineer - Observability. Be the first to apply!