Staff Site Reliability Engineer - Observability GCP
$194k - $267kOkta
Secure Every Identity, from AI to Human
Identity is the key to unlocking the potential of AI. Okta secures AI by building the trusted, neutral infrastructure that enables organizations to safely embrace this new era. This work requires a relentless drive to solve complex challenges with real-world stakes. We are looking for builders and owners who operate with speed and urgency and execute with excellence. This is an opportunity to do career-defining work. We're all in on this mission. If you are too, let's talk.We are seeking a highly technical Observability Site Reliability Engineer with a specialty in Google Cloud, to own and expand our Observability ecosystem into GCP. In this role, you will move beyond simple monitoring to delivering a world class, comprehensive, scalable Observability Platform that enables our SRE teams and business partners. You will treat infrastructure as code —utilizing Terraform and strong coding proficiency in Go, Python, or Ruby —to automate the deployment of agents and collectors across complex distributed systems.
Key Responsibilities
- Automated Infrastructure: Design, build, and maintain scalable observability infrastructure using tools like Terraform.
- GCP Observabilty Engineering: Optimize the collection, processing, and storage of Observabilty data to ensure high reliability and low latency of our Splunk and Grafana services
- Incident Response: Participate in on-call rotations and lead post-incident reviews to drive systemic improvements and "observability-driven development."
- Automation: Eliminate "toil" by automating the deployment and scaling of observability agents and collectors.
Required Skills & Experience (The Essentials)
GKE: Minimum 5+ Experience scaling and managing observability in a Google Cloud platform. Visualization: Expertise in creating intuitive, actionable Splunk or Grafana dashboards that correlate data across multiple sources. SRE Mindset: Minimum 3+ years of experience in an SRE, DevOps, or Systems Engineering role with a focus on high-availability systems.
- Programming Proficiency: Strong coding skills in Python , Go for building internal tools and automating workflows.
- Distributed Systems: Deep understanding of Linux internals, networking (TCP/IP, DNS, Load Balancing), and container orchestration (Kubernetes/GKE).
- Problem Solving: A data-driven approach to debugging complex, cross-service performance bottlenecks.
Bonus Skills (The "Nice-to-Haves")
- Telemetry Standards: Hands-on experience with OpenTelemetry (OTel), Vector, or similar frameworks for instrumenting applications.
- Grafana Loki: Experience in migrating Splunk to Grafana Loki
Other Cloud Platforms: Experience managing observability native tools within AWS.
Additional requirements:
- This position requires the ability to access federal environments and/or have access to protected federal data. As a condition of employment for this position, the successful candidate must be able to submit documentation establishing U.S. Person status (e.g. a U.S. Citizen, National, Lawful Permanent Resident, Refugee, or Asylee. 22 CFR 120.15) upon hire.
#LI-MM
#LI-Hybrid
P24517_3387022
Below is the annual base salary range for candidates located in San Francisco Bay Area. Your actual base salary will depend on factors such as your skills, qualifications, experience, and work location. In addition, Okta offers equity (where applicable), bonus, and benefits, including health, dental and vision insurance, 401(k), flexible spending account, and paid leave (including PTO and parental leave) in accordance with our applicable plans and policies. To learn more about our Total Rewards program please visit: .
The annual base salary range for this position for candidates located in the San Francisco Bay area is between: $194,000—$267,000 USDThe Okta Experience
We are intentional about connection. Our global community, spanning over 20 offices worldwide, is united by a drive to innovate. Your journey begins with an immersive, in-person onboarding experience designed to accelerate your impact and connect you to our mission and team from day one.
Okta is an Equal Opportunity Employer. All qualified applicants will receive consideration for employment without regard to race, color, religion, sex, sexual orientation, gender identity, national origin, ancestry, marital status, age, physical or mental disability, or status as a protected veteran. We also consider for employment qualified applicants with arrest and convictions records, consistent with applicable laws. If reasonable accommodation is needed to complete any part of the job application, interview process, or onboarding please use this Form to request an accommodation. Notice for New York City Applicants & Employees: Okta may use Automated Employment Decision Tools (AEDT), as defined by New York City Local Law 144, that use artificial intelligence, machine learning, or other automated processes to assist in our recruitment and hiring process. In accordance with NYC Local Law 144, if you are an applicant or employee residing in New York City, please click here to view our full NYC AEDT Notice.- ...Software Solutions is hiring a Senior Site Reliability Engineer (SRE) in Seattle, USA . Lead site... ...availability targets through advanced observability, automation, and resilience engineering... ...Experience with cloud infrastructure (AWS or GCP) at scale with multi-region...SuggestedFlexible hours
- ...excellence.This is an opportunity to do career-defining work. We're all in on this mission. If you are too, let's talk. Observability Engineer The Auth0 Platform Observability team owns the observability tooling that monitors the Auth0 Platform, and we are looking...Suggested
- Axon in Seattle is seeking a Senior Engineer for its observability team. You'll design and evolve the observability platform, working on distributed tracing, logging, and metrics across Axon's infrastructures. The ideal candidate has strong engineering experience, ideally...Suggested
- ...Lead the architecture and implementation of a comprehensive observability strategy across the entire SIEM modernization ecosystem, spanning... ..., and executive-level views). Partner closely with Security Engineering, Platform Engineering, and Data Engineering to ensure...Suggested
$163.62k - $212.71k
...platforms, and processes that improve our engineering teams’ productivity and streamline... ...and strategic Lead/Principal Site Reliability Engineer to drive the reliability, scalability... ...(e.g., EMR, Databricks, Glue). Observability and Monitoring: Establish comprehensive...SuggestedPermanent employmentFull timePart timeWork experience placementWork at officeLocal areaImmediate startRemote workWork from homeFlexible hoursShift work3 days per week1 day per week- Smartsheet is seeking a Senior Manager of Engineering in Bellevue, WA to lead their Engineering team in developing a centralized observability platform. You will oversee engineering strategy, team building, and the integration of observability tools across services. The...Remote job
$217.1k - $298.55k
RDQ126R35 At Databricks, observability and governance are what turn a massive, multi‑tenant... ...recommendations that help customers run workloads reliably at scale. Beyond query observability,... ...across all these surfaces, raising the engineering bar of the combined team, and shaping...$98.5k - $233.25k
...Permanent* Experienced Professionals* Software Engineering* ID 499530-en\_GBChoosing Capgemini... ...role focuses on defining and governing GCP-based solutions for modernizing legacy... ...HA), disaster recovery (DR), backup, and observability are production-grade • Partner with Operations...Permanent employmentFull timeLocal area- ...Role Overview We are seeking a high-caliber Site Reliability Engineer (SRE) to join our Forward Engineering team. You will be the guardian... ...resource cleanup. 4. Monitoring, Alerting & Incident Response Observability: Build and manage comprehensive dashboards using...Local area
- ...Junior Software Developer - Observability at Canonical Canonical is a leading provider of open source software and operating systems to... ...enterprise initiatives such as public cloud, data science, AI, engineering innovation, and IoT. Our customers include the world’s...Work at officeRemote workWork from home
$91k - $162k
...performant and scalable but also secure, observable, and built for long‑term maintainability... ...looking for both front‑end and back‑end engineers, while the ideal profile is full‑stack.... ...including public cloud platforms (AWS, GCP, or Azure), infrastructure‑as‑code tools...Local area- ...Tech Stack Cloud : Google Cloud Platform (GCP) Infrastructure as Code : Pulumi CI/CD :... ...another senior role. As our Senior Software Engineer, you will be working closely with the CTO... ..." gathers dust). 12 Paid Holidays: We observe all standard holidays to give the team time...Work at officeRemote workMonday to ThursdayFlexible hours
$120k - $150k
...maintain highly available, scalable, and reliable infrastructure Working closely with client... ...the architecture and management of GCP resources, particularly BigQuery, Databricks... ...techniques Collaboration: Work closely with data engineers, developers, and stakeholders to ensure...Hourly payFull timeLocal area$165k - $242k
...Senior Software Engineer, IAM New York, NY, Sunnyvale, CA, Bellevue... ...messaging patterns Observability: Fluency with best practices... ...components. Experience building reliable and scalable platform... ...managing cloud infrastructure (AWS, GCP, etc.) Familiarity with Infrastructure...Permanent employmentTemporary workCasual workWork at officeFlexible hours$117.75k - $195k
...looking for an exceptional Software Engineer, Agentic AI to help shape the... ...infrastructure (e.g., AWS or GCP) and container orchestration... ...workflows to ensure efficient and reliable code delivery. Implement model serving and observability systems to monitor performance,...Work at officeRemote workFlexible hoursShift work3 days per week$55 per hour
...Position: Sr. Engineer, Software - Kafka Location: Bellevue WA... ...the quality, scalability, and reliability of software delivered, and the... ...Implement audit logging, observability, and human-in-the-loop controls... ...(Azure, AWS, or GCP) Experience contributing to...$156k - $312k
...Summary... Distinguished Engineer - Marketplace, Category Strategic... ...excellence through strong observability, automation, CI/CD, and... ...solutions are designed with reliability, privacy, and compliance in... ...system development; Azure / GCP / WCNP preferred. Strong...Full timeTemporary workPart time$195.47k - $220k
...Job Description: Software Engineer II Bellevue, WA - IT - Healthcare (Vet Care Reclass... ...AWS or other cloud technologies such as GCP, building applications using AWS... ...assisted software development tools, AI-driven observability systems, or machine learning-powered...Full timeWork at officeLocal areaRemote workFlexible hoursShift work2 days per week$152.2k - $243.7k
...seeking a Sr. Consultant Software Engineer who will architect, design,... ...Autogen. Scalability & Reliability Lead the design and... ...auto-scaling, monitoring, and observability using Kubernetes, Docker, and... ...with cloud platforms (AWS, GCP, Azure). * Familiarity with...Work experience placementWork at officeLocal area$169.5k - $271.5k
...the Discovery team’s staff of machine learning experts... ...for a Staff Software Engineer to join the team as a... ...to drive scalable, reliable, and high‑impact software... ...productivity, system observability, and operational... ...based deployment (AWS, GCP, or Azure). Experience...Local areaFlexible hours$134.96k - $188.95k
## Ground System Site Reliability Engineer IIApplylocations: Greater Seattle Areatime type: Full timeposted on: Posted Yesterdayjob requisition id... ...infrastructure with common cloud platforms such as AWS/Azure/GCP* Experience with Kubernetes and common service meshes like...Permanent employmentTemporary workLocal area$134.25k - $214.8k
...where you matter. Your Impact Are you an engineer who gets excited about the challenge of making complex distributed systems observable - not just instrumenting them, but... ...of the Observability team within Axon's Site Reliability organization - a focused team responsible...Work experience placementWork at officeRemote work$165k - $230k
Sr. Site Reliability Engineer (Starshield) Redmond, WA SpaceX was founded under the belief that a future where humanity is out exploring the stars... ...for government use, with an initial focus on earth observation, communications, and hosted payloads. SpaceX’s satellite programs...Permanent employmentTemporary workWork at officeImmediate startMonday to FridayWeekend work- ...apply. The Role As a Senior Platform Engineer, you are a champion for DevOps and SRE... ...You Will Be Doing Improving production reliability and system resilience within an SRE scoped... ...Experience operating a production observability stack (metrics, logs, traces), with an...Flexible hours
- ...distributed systems, and data, powering platform observability, analytics, and insights across Automation Cloud. This is a role for engineers who care deeply about how systems behave in... ...infrastructure (Azure preferred, AWS or GCP acceptable) Strong proficiency in C# (...
$150k - $180k
...improve cloud infrastructure reliability, scalability, and... ...platforms and tools that enable engineering teams to provision services... ...engineering, cloud infrastructure, or site reliability engineering.... ...releases. Experience using observability tools such as APM, logging,...$139.5k - $258.1k
...States Software and Services The Apple Service Engineering - Data Streaming SRE team is looking for Site Reliability Engineers with experience developing processes, tools... ...platform infrastructure. Experience with AWS, GCP and IaC such as Terraform Preferred...Relocation$260k - $385k
...Role We are seeking a Software Engineer, Security Observability to join our Security team. In this role... ...improve the resilience and reliability of data systems to ensure high platform... ...technical domains such as databases, site reliability engineering (SRE), or security...Remote workRelocation package- ...responsibilities As a Software Engineer on the Compute Platform team... ...execution environments Observability & Operations: Drive operational... ...ensuring high availability, reliability, and security for our... ...multi-cloud environments (AWS, GCP, Azure) and cloud-provider integrations...
- ...responsibilities We’re looking for Senior Backend Engineers who thrive on building reliable, scalable systems and want to make... ...overall quality, resilience, and observability of our systems Review code with a... ...providers such as AWS, Azure or GCP, with a focus on scalability,...
Do you want to receive more vacancies?
Subscribe and receive similar vacancies to Staff Site Reliability Engineer - Observability GCP. Be the first to apply!
- staff engineer Bellevue, WA
- senior staff systems engineer Bellevue, WA
- engineering aide Bellevue, WA
- assistant engineer Bellevue, WA
- technology administrator Bellevue, WA
- site services specialist Bellevue, WA
- site leader Bellevue, WA
- site safety Bellevue, WA
- junior website developer Bellevue, WA
- on-site clinical research associate (traveling/remote) Bellevue, WA


