Staff Site Reliability Engineer - Observability
$147k - $202kOkta
Secure Every Identity, from AI to Human
Identity is the key to unlocking the potential of AI. Okta secures AI by building the trusted, neutral infrastructure that enables organizations to safely embrace this new era. This work requires a relentless drive to solve complex challenges with real-world stakes. We are looking for builders and owners who operate with speed and urgency and execute with excellence. This is an opportunity to do career-defining work. We're all in on this mission. If you are too, let's talk.Position Overview:
We are seeking a highly technical Staff Observability Site Reliability Engineer with a specialty in Splunk to own and evolve our Splunk ecosystem. In this role, you will move beyond simple monitoring to delivering a world class, comprehensive, scalable Observability Platform that enables our SRE teams and business partners. You will treat infrastructure as code —utilizing Terraform and strong coding proficiency in Go, Python, or Ruby —to automate the deployment of agents and collectors across complex distributed systems.
Key Responsibilities
- Automated Infrastructure: Design, build, and maintain scalable observability infrastructure using tools like Terraform.
- Splunk Engineering: Optimize the collection, processing, and storage of log data to ensure high reliability and low latency of our Splunk services
- Incident Response: Participate in on-call rotations and lead post-incident reviews to drive systemic improvements and "observability-driven development."
- Automation: Eliminate "toil" by automating the deployment and scaling of observability agents and collectors.
Required Skills & Experience (The Essentials)
Log Management: Minimum 5+ Experience scaling and managing Splunk Cloud at scale (1000+ SVCs), including Workload Management (WLM) and HEC optimization. Visualization: Expertise in creating intuitive, actionable Splunk dashboards that correlate data across multiple sources.
SRE Mindset: Minimum 5+ years of experience in an SRE, DevOps, or Systems Engineering role with a focus on high-availability systems.
- Programming Proficiency: Strong coding skills in SPL , Go for building internal tools and automating workflows.
- Distributed Systems: Deep understanding of Linux internals, networking (TCP/IP, DNS, Load Balancing), and container orchestration (Kubernetes/EKS).
- Problem Solving: A data-driven approach to debugging complex, cross-service performance bottlenecks.
Bonus Skills (The "Nice-to-Haves")
- Telemetry Standards: Hands-on experience with OpenTelemetry (OTel), Vector, or similar frameworks for instrumenting applications.
- Charge-back app: Experience in implementing Splunk charge-back app for usage reporting
Cloud Platforms: Experience managing observability native tools within AWS or GCP.
Additional requirements:
- This position requires the ability to access federal environments and/or have access to protected federal data. As a condition of employment for this position, the successful candidate must be able to submit documentation establishing U.S. Person status (e.g. a U.S. Citizen, National, Lawful Permanent Resident, Refugee, or Asylee. 22 CFR 120.15) upon hire.
- This person must attend in person onboarding in our San Francisco office the first week of employment.
#LI-MM
#LI-Hybrid
P14596_3372199
Below is the annual base salary range for candidates located in California (excluding San Francisco Bay Area), Colorado, Illinois, New York and Washington. Your actual base salary will depend on factors such as your skills, qualifications, experience, and work location. In addition, Okta offers equity (where applicable), bonus, and benefits, including health, dental and vision insurance, 401(k), flexible spending account, and paid leave (including PTO and parental leave) in accordance with our applicable plans and policies. To learn more about our Total Rewards program please visit: .
The annual base salary range for this position for candidates located in California (excluding San Francisco Bay Area), Colorado, Illinois, New York, and Washington is between: $147,000—$202,000 USD
The Okta Experience
We are intentional about connection. Our global community, spanning over 20 offices worldwide, is united by a drive to innovate. Your journey begins with an immersive, in-person onboarding experience designed to accelerate your impact and connect you to our mission and team from day one.
Okta is an Equal Opportunity Employer. All qualified applicants will receive consideration for employment without regard to race, color, religion, sex, sexual orientation, gender identity, national origin, ancestry, marital status, age, physical or mental disability, or status as a protected veteran. We also consider for employment qualified applicants with arrest and convictions records, consistent with applicable laws. If reasonable accommodation is needed to complete any part of the job application, interview process, or onboarding please use this Form to request an accommodation. Notice for New York City Applicants & Employees: Okta may use Automated Employment Decision Tools (AEDT), as defined by New York City Local Law 144, that use artificial intelligence, machine learning, or other automated processes to assist in our recruitment and hiring process. In accordance with NYC Local Law 144, if you are an applicant or employee residing in New York City, please click here to view our full NYC AEDT Notice. Okta is committed to complying with applicable data privacy and security laws and regulations. For more information, please see our Personnel and Job Candidate Privacy Notice at .$194k - $267k
...career-defining work. We're all in on this mission. If you are too, let's talk. We are seeking a highly technical Observability Site Reliability Engineer with a specialty in Google Cloud, to own and expand our Observability ecosystem into GCP. In this role, you will...SuggestedPermanent employmentLocal areaWorldwideFlexible hours$147k - $202k
...too, let's talk. The Auth0 Platform Observability team owns the observability tooling that... ...we are looking for an Observability Engineer to help ensure that our Product and Platform... .... If you have experience within the Site Reliability Engineering (SRE) field or working as...SuggestedLocal areaWorldwideFlexible hours- Axon in Seattle is seeking a Senior Engineer for its observability team. You'll design and evolve the observability platform, working on distributed tracing, logging, and metrics across Axon's infrastructures. The ideal candidate has strong engineering experience, ideally...Suggested
- ...Senior Technical Leader to drive strategy and execution for observability and governance teams. The ideal candidate will have 12+ years... ...technical direction, leading impactful projects, and mentoring engineers. Candidates should possess deep knowledge in observability...Suggested
$163.62k - $212.71k
...platforms, and processes that improve our engineering teams’ productivity and streamline... ...and strategic Lead/Principal Site Reliability Engineer to drive the reliability, scalability... ...(e.g., EMR, Databricks, Glue). Observability and Monitoring: Establish comprehensive...SuggestedPermanent employmentFull timePart timeWork experience placementWork at officeLocal areaImmediate startRemote workWork from homeFlexible hoursShift work3 days per week1 day per week- Site Reliability Engineer Your role and responsibilities Manage deployments of Apptio services to AWS GovCloud. Monitor KPIs of services running... ...technical and professional experience Familiarity with observability (e.g. Prometheus) Familiarity with cloud provider...Temporary workRemote work
- Smartsheet is seeking a Senior Manager of Engineering in Bellevue, WA to lead their Engineering team in developing a centralized observability platform. You will oversee engineering strategy, team building, and the integration of observability tools across services. The...Remote job
$217.1k - $298.55k
RDQ126R35 At Databricks, observability and governance are what turn a massive, multi‑tenant... ...recommendations that help customers run workloads reliably at scale. Beyond query observability,... ...across all these surfaces, raising the engineering bar of the combined team, and shaping...- ...APPIT Software Solutions is hiring a Senior Site Reliability Engineer (SRE) in Seattle, USA . Lead site reliability engineering efforts for... ...systems, driving 99.99% availability targets through advanced observability, automation, and resilience engineering. Responsibilities...Flexible hours
- ...Role Overview We are seeking a high-caliber Site Reliability Engineer (SRE) to join our Forward Engineering team. You will be the guardian... ...resource cleanup. 4. Monitoring, Alerting & Incident Response Observability: Build and manage comprehensive dashboards using...Local area
$260k - $385k
...Role We are seeking a Software Engineer, Security Observability to join our Security team. In this role... ...improve the resilience and reliability of data systems to ensure high platform... ...technical domains such as databases, site reliability engineering (SRE), or security...Remote workRelocation package$134.25k - $214.8k
...where you matter. Your Impact Are you an engineer who gets excited about the challenge of making complex distributed systems observable - not just instrumenting them, but... ...of the Observability team within Axon's Site Reliability organization - a focused team responsible...Work experience placementWork at officeRemote work- ...apply. The Role As a Senior Platform Engineer, you are a champion for DevOps and SRE... ...You Will Be Doing Improving production reliability and system resilience within an SRE scoped... ...Experience operating a production observability stack (metrics, logs, traces), with an...Flexible hours
$150k - $180k
...improve cloud infrastructure reliability, scalability, and... ...platforms and tools that enable engineering teams to provision services... ...engineering, cloud infrastructure, or site reliability engineering.... ...releases. Experience using observability tools such as APM, logging,...- ...world's most complex and mission-critical systems. As a Site Reliability Engineer III - DevOps Engineer at JPMorgan Chase within the Commercial... ...CircleCI, AWS CodePipeline, Spinnaker. Experience with observability, monitoring, and logging tools like Prometheus, Grafana,...
- Junior Software Developer - Observability at Canonical Canonical is a leading provider of open source software and operating systems to... ...enterprise initiatives such as public cloud, data science, AI, engineering innovation, and IoT. Our customers include the world’s...Work at officeRemote workWork from home
$24.61 - $25.68 per hour
...Join Our Team as Shelter Staff and Make a Difference Are you passionate about supporting vulnerable populations and creating a safe... ...meal setups. Strong communication is key, as you will report observations and incidents to the shelter leadership team to ensure proper...Hourly payPart timeLocal areaWeekend work$194k - $267k
...on new concepts and tools. Position Overview: The Site Reliability Engineer (SRE) will play a key role in building and managing Kubernetes... ...provide service-to-service communication, security, and observability within the Kubernetes clusters. Enable fine-grained...Permanent employmentWork at officeLocal areaWorldwideFlexible hours- Nrg Bluewater Wind is seeking a Sr AI Platform Engineer in Seattle, WA to develop infrastructure for AI systems. You will create evaluation, observability, and safety measures for AI features, ensuring quality and customer impact. Required qualifications include a Bachelor...
$109k - $145k
...ll Do: We are seeking Software Engineers to join our efforts in building,... ..., and optimizing highly scalable, reliable, and secure systems. The Observability team is responsible for deploying... ...experience in Software Engineering, Site Reliability Engineering, DevOps,...Permanent employmentTemporary workCasual workWork at officeRemote workFlexible hours$147k - $202k
...too, let's talk. The Auth0 Platform Observability team owns the observability tooling that... ...we are looking for an Observability Engineer to help ensure that our Product and Platform... .... If you have experience within the Site Reliability Engineering (SRE) field or working as...Local areaWorldwideFlexible hours$127k - $249k
Senior / Staff Engineer - SRE, InfraSec We are looking for an experienced Senior or Staff Engineer for our SRE, InfraSec team to guide... ...in security‑focused areas such as runtime scanning, security observability, CSPM, and more. Strong experience with at least one cloud platform...Local areaRemote work$139.5k - $258.1k
...Software and Services Apple Services Engineering (ASE) designs, builds, and operates the... ...collaborative, and pragmatic Storage Site Reliability Engineer to join our team. In this... ...design and implement automation, improve observability, strengthen incident response, and...Relocation- A global tech firm is seeking a Junior Software Developer to join their Observability team. This role involves developing innovative solutions using Python and Go in a cloud-native environment. Candidates should have a strong passion for open source software and demonstrate...Remote job
$140k - $175k
..., we are looking for a Senior Software Engineer, Platform Engineering to help strengthen... ...monitor, and operate software safely, reliably, and efficiently. This includes build and... ...testing infrastructure, monitoring and observability systems, developer tooling, and core...Private practiceLocal area- United States Digital Space LLC seeks a Staff Software Engineer to design and develop cutting-edge data streaming platforms in Bellevue, WA. This role emphasizes collaboration with product and ML teams, focusing on creating scalable event-processing systems. The ideal candidate...
- ...Senior Platform Engineer Lambda, the superintelligence cloud, is a leader in AI cloud... ...test, and ship software efficiently and reliably. Working closely with engineering teams... ...you own well-documented, reliable, and observable Who You Are ~5+ years of experience...Work at officeLocal areaRemote workWork from homeFlexible hours
- ..., maintain, and defend our way of life. From technicians and engineers to first responders and service members, they embody the hard... ...Harden the system:testing,failure recovery, data integrity, observability Uphold andmaintainahigh-qualitycodebaseover time Requirements...Permanent employment
- ...Role This isn't just another senior role. As our Senior Software Engineer, you will be working closely with the CTO, making critical... ...you if your "Out of Office" gathers dust). 12 Paid Holidays: We observe all standard holidays to give the team time to recharge. Parental...Work at officeRemote workMonday to ThursdayFlexible hours
$91k - $162k
...ensuring that our systems are not only performant and scalable but also secure, observable, and built for long‑term maintainability. We’re looking for both front‑end and back‑end engineers, while the ideal profile is full‑stack. Key Responsibilities Partner with product...Local area
Do you want to receive more vacancies?
Subscribe and receive similar vacancies to Staff Site Reliability Engineer - Observability. Be the first to apply!
- engineering aide Bellevue, WA
- senior staff systems engineer Bellevue, WA
- staff engineer Bellevue, WA
- technology administrator Bellevue, WA
- assistant engineer Bellevue, WA
- on-site clinical research associate (traveling/remote) Bellevue, WA
- junior website developer Bellevue, WA
- IT site lead Bellevue, WA
- site leader Bellevue, WA
- site safety Bellevue, WA

