Staff Site Reliability Engineer - Observability
$147k - $202kOkta
Secure Every Identity, from AI to Human
Identity is the key to unlocking the potential of AI. Okta secures AI by building the trusted, neutral infrastructure that enables organizations to safely embrace this new era. This work requires a relentless drive to solve complex challenges with real-world stakes. We are looking for builders and owners who operate with speed and urgency and execute with excellence. This is an opportunity to do career-defining work. We're all in on this mission. If you are too, let's talk.Position Overview:
We are seeking a highly technical Staff Observability Site Reliability Engineer with a specialty in Splunk to own and evolve our Splunk ecosystem. In this role, you will move beyond simple monitoring to delivering a world class, comprehensive, scalable Observability Platform that enables our SRE teams and business partners. You will treat infrastructure as code —utilizing Terraform and strong coding proficiency in Go, Python, or Ruby —to automate the deployment of agents and collectors across complex distributed systems.
Key Responsibilities
- Automated Infrastructure: Design, build, and maintain scalable observability infrastructure using tools like Terraform.
- Splunk Engineering: Optimize the collection, processing, and storage of log data to ensure high reliability and low latency of our Splunk services
- Incident Response: Participate in on-call rotations and lead post-incident reviews to drive systemic improvements and "observability-driven development."
- Automation: Eliminate "toil" by automating the deployment and scaling of observability agents and collectors.
Required Skills & Experience (The Essentials)
Log Management: Minimum 5+ Experience scaling and managing Splunk Cloud at scale (1000+ SVCs), including Workload Management (WLM) and HEC optimization. Visualization: Expertise in creating intuitive, actionable Splunk dashboards that correlate data across multiple sources.
SRE Mindset: Minimum 5+ years of experience in an SRE, DevOps, or Systems Engineering role with a focus on high-availability systems.
- Programming Proficiency: Strong coding skills in SPL , Go for building internal tools and automating workflows.
- Distributed Systems: Deep understanding of Linux internals, networking (TCP/IP, DNS, Load Balancing), and container orchestration (Kubernetes/EKS).
- Problem Solving: A data-driven approach to debugging complex, cross-service performance bottlenecks.
Bonus Skills (The "Nice-to-Haves")
- Telemetry Standards: Hands-on experience with OpenTelemetry (OTel), Vector, or similar frameworks for instrumenting applications.
- Charge-back app: Experience in implementing Splunk charge-back app for usage reporting
Cloud Platforms: Experience managing observability native tools within AWS or GCP.
Additional requirements:
- This position requires the ability to access federal environments and/or have access to protected federal data. As a condition of employment for this position, the successful candidate must be able to submit documentation establishing U.S. Person status (e.g. a U.S. Citizen, National, Lawful Permanent Resident, Refugee, or Asylee. 22 CFR 120.15) upon hire.
- This person must attend in person onboarding in our San Francisco office the first week of employment.
#LI-MM
#LI-Hybrid
P14596_3372199
Below is the annual base salary range for candidates located in California (excluding San Francisco Bay Area), Colorado, Illinois, New York and Washington. Your actual base salary will depend on factors such as your skills, qualifications, experience, and work location. In addition, Okta offers equity (where applicable), bonus, and benefits, including health, dental and vision insurance, 401(k), flexible spending account, and paid leave (including PTO and parental leave) in accordance with our applicable plans and policies. To learn more about our Total Rewards program please visit: .
The annual base salary range for this position for candidates located in California (excluding San Francisco Bay Area), Colorado, Illinois, New York, and Washington is between: $147,000—$202,000 USD
The Okta Experience
We are intentional about connection. Our global community, spanning over 20 offices worldwide, is united by a drive to innovate. Your journey begins with an immersive, in-person onboarding experience designed to accelerate your impact and connect you to our mission and team from day one.
Okta is an Equal Opportunity Employer. All qualified applicants will receive consideration for employment without regard to race, color, religion, sex, sexual orientation, gender identity, national origin, ancestry, marital status, age, physical or mental disability, or status as a protected veteran. We also consider for employment qualified applicants with arrest and convictions records, consistent with applicable laws. If reasonable accommodation is needed to complete any part of the job application, interview process, or onboarding please use this Form to request an accommodation. Notice for New York City Applicants & Employees: Okta may use Automated Employment Decision Tools (AEDT), as defined by New York City Local Law 144, that use artificial intelligence, machine learning, or other automated processes to assist in our recruitment and hiring process. In accordance with NYC Local Law 144, if you are an applicant or employee residing in New York City, please click here to view our full NYC AEDT Notice. Okta is committed to complying with applicable data privacy and security laws and regulations. For more information, please see our Personnel and Job Candidate Privacy Notice at .- ...Senior Site Reliability Engineer United States About OfficeSpace: OfficeSpace Software provides the leading AI operating system for... ...infrastructure provisioning, configuration management, and observability. Set clear operational standards for reliability,...SuggestedShift work
$175k - $195k
...Filevine Sr. Observability Engineer Filevine is a Legal AI company delivering Legal Operating Intelligence... ...# Define and manage SLIs, SLOs, and reliability metrics. # Lead incident response,... ..., or operations. #5+ years of Site Reliability Engineering experience....SuggestedFull timeTemporary work- ...Site Reliability Engineer Qualifications: 10+ years of overall experience in IT including, with hands-on Development and Systems engineering... ...: Twistlock, Sysdig, Aqua etc. Platform Monitoring, Observability, & Performance Tools: Nginx, New Relic, AppDynamics,...SuggestedTemporary workImmediate start
- ...Sr. Site Reliability Engineer (SRE) III As a Sr. Site Reliability Engineer (SRE) III, you'll work as part of a collaborative and high-performing... ...to support reliable software delivery and operational observability across development, integration, staging, and production...SuggestedImmediate start
$112k - $179k
...Role Peraton is seeking a self-driven and resourceful Site Reliability Engineer to join our dynamic of Network and UC engineers in... ...infrastructure. The SRE will drive automation initiatives, observability improvements, and incident response operations. Site Reliability...SuggestedContract workWorldwideShift work$100.2k - $203.4k
...Site Reliability Engineer At Accenture Federal Services, nothing matters more than helping the US federal government make the nation stronger... ...optimization strategies Operate and enhance observability platforms using OpenTelemetry, Prometheus, Grafana, Loki,...- ...Role Overview We are seeking a high-caliber Site Reliability Engineer (SRE) to join our Forward Engineering team. You will be the guardian... .... 4. Monitoring, Alerting & Incident Response Observability: Build and manage comprehensive dashboards using...Local area
- ...Site Reliability Engineer (SRE) Randstad is seeking a skilled and proactive Site Reliability Engineer (SRE) to join our client in the Washington... ...across engineering teams to drive automation, enhance observability, and ensure the continuous, secure delivery of high-...
$114.6k - $190.2k
...Site Reliability Engineer (SRE) Unlock the secrets of intelligence with MANTECH! Join a dynamic team at the forefront of national security... ...processes for the AI platform. Implement and optimize observability operations using OpenTelemetry, Prometheus, Grafana, Loki...Hourly payContract workTemporary workWork experience placementWork at officeLocal areaRemote work$51.9 per hour
...is responsible for the reliability, availability, and... ...role blends software engineering, clinical engineering,... ...functionally with AHN site leaders and teams to navigate... ...management and staff productivity.Plan, organize... ...actions. Utilizes observability practices to gain deep...For contractorsLocal area$106.3k - $221.1k
...Senior Site Reliability Engineer At Accenture Federal Services, nothing matters more than helping the US federal government make the nation... ...Services and need an accommodation for a disability or religious observance during the interview process or for the job you are...Live inWork at officeLocal area- ...Job Title: Site Reliability Engineer (SRE) Location: Washington, DC (Onsite) Clearance: TS/SCI Position Overview Seeking... ...health, availability, and performance using enterprise observability tools • Analyze metrics and logs to proactively detect...
$109.5k - $150.55k
...Description Renaissance is looking for an experienced Sr Site Reliability Engineer to be part of the Engineering Enablement group's Site... ...Application and Infrastructure Availability, Reliability, Observability & Security. We are at the crossroads of evolving our current...For contractorsLocal areaRemote workWorldwideWork visaFlexible hoursWeekend work$100k - $170k
...Site Reliability Engineer Nscale is the GPU cloud engineered for AI—purpose-built to deliver high-performance, cost-efficient infrastructure... ...environments ~ Familiarity with monitoring or observability tools (e.g., logs, metrics, dashboards) ~ Strong willingness...Flexible hours$165k - $230k
...with the ultimate goal of enabling human life on Mars. SR. SITE RELIABILITY ENGINEER (STARSHIELD) Starshield leverages SpaceX's Starlink... ...designed for government use, with an initial focus on earth observation, communications, and hosted payloads. SpaceX's satellite programs...Temporary workImmediate startWeekend work- ...Principal Site Reliability Engineer The Principal Site Reliability Engineer will be a critical technical leader responsible for driving... ...blue/green strategies, and ensuring artifact validation. Observability & Telemetry: Develop comprehensive observability...
- ...excellence.This is an opportunity to do career-defining work. We're all in on this mission. If you are too, let's talk. Observability Engineer The Auth0 Platform Observability team owns the observability tooling that monitors the Auth0 Platform, and we are looking...
- A leading insurance company is seeking a Senior Engineer to drive innovation in building high-performance, low-maintenance platforms... .... The role requires deep technical expertise in open-source observability and experience with distributed systems, Docker, and Kubernetes...
$158k - $195k
Position Summary: We are looking for a savvy engineer who will provide technical leadership... ...platform capabilities that improve reliability, security, and consistency across development... ...enabling application teams to adopt observability solutions such as the ELK Stack for...For contractorsWork at officeWork from home2 days per week- ...This role requires regularly working on-site at customer locations in Arlington, VA... ...About The Role We are hiring a Site Reliability Engineer to join our Infrastructure & Security... ...by building the tools, processes, and observability that make "fast recovery" a reality....RelocationRelocation package
$158k - $195k
NRECA seeks a skilled engineer to provide technical leadership in software delivery through shared platforms... ...DevOps practices. The role focuses on enhancing reliability and security across teams while adopting observability solutions like the ELK Stack. Position includes...Work from home$100k - $215k
GEICO is seeking a Senior Engineer in Bethesda, Maryland to enhance their cloud platforms through innovative design and deployment. The role focuses on improving performance, automation, and observability within OpenStack-based environments. Ideal candidates will have...Flexible hours$126k - $248k
...you will partner with SRE leaders and engineers to scale the platform that underpins all... ...program execution, strengthen production reliability practices, and coordinate cross-... ...with Kubernetes, cloud networking, or observability stacks (metrics, logs, tracing, alerting...Local areaRemote workWorldwideFlexible hours$106.5k - $177.5k
The Site Reliability Engineering discipline at Noctua Technology, LLC is a strategic force driving digital transformation. We treat operations... ...as Code (IaC), monitor it through advanced observability stacks, and protect it by engineering for failure. We work...Full timeRemote work$135k - $150k
...Expertise: From veteran leadership to cleared engineers, our people understand both the... ...Bridge Defense seeks a highly qualified Site Reliability Engineer to build and lead the company... ...interact effectively at all levels of staff and management. Ability to exercise...RelocationFlexible hours$160k - $180k
...Site Reliability Engineer Location: Hybrid – Washington DC/Virginia/Maryland metro with the ability to travel to Patuxent River, MD, as needed (up to 20% of the time). Compensation: $160,000 - 180,000 per year, depending on experience and qualifications. Employment...Full timeTemporary workLocal areaRemote workFlexible hours$83k - $187k
...practices, and ability to develop tools that automate incident management. Description We are looking for a Senior Site Reliability Engineer to join our OCI team. This role is part of a globally distributed team responsible for detecting, triaging, and mitigating...Temporary workWork experience placementFlexible hours$131k - $227.13k
...Description: The 1LMX MES COE is seeking an engineer who will own infrastructure‑as‑code, cloud platform, and reliability for the Apriso environment on AWS. This role blends full‑stack development, DevOps, and Site Reliability Engineering (SRE) practices to deliver a...Full timeTemporary workWork experience placementWork at officeRemote workRelocationFlexible hoursShift work3 days per week- ...Site Reliability Engineer II Join the leader in providing smarter solutions for a safer world. The property technology space is growing... ...with product engineering teams. Build and maintain observability infrastructure – metrics, logs, and distributed traces –...Remote work
- Insight Global is seeking an experienced Observability Engineer to enhance system health and performance in a complex IT landscape, including... ...dashboards and implement monitoring solutions to ensure reliability and security. The role requires 7+ years in IT operations...Remote job
Do you want to receive more vacancies?
Subscribe and receive similar vacancies to Staff Site Reliability Engineer - Observability. Be the first to apply!
- engineering aide Washington DC
- senior staff systems engineer Washington DC
- staff engineer Washington DC
- technology administrator Washington DC
- senior staff engineer Washington DC
- assistant engineer Washington DC
- software engineer staff Washington DC
- site reliability engineer sre Washington DC
- site reliability engineer Washington DC
- on-site clinical research associate (traveling/remote) Washington DC

