Sign up to access all features of our service.
  • Job search
  • Favorites
  • Create a CV
    New
  • Salaries
  • Subscriptions

Staff Site Reliability Engineer - Observability GCP

$194k - $267k

Okta

Secure Every Identity, from AI to Human

Identity is the key to unlocking the potential of AI. Okta secures AI by building the trusted, neutral infrastructure that enables organizations to safely embrace this new era. This work requires a relentless drive to solve complex challenges with real-world stakes. We are looking for builders and owners who operate with speed and urgency and execute with excellence.

This is an opportunity to do career-defining work. We're all in on this mission. If you are too, let's talk.

We are seeking a highly technical Observability Site Reliability Engineer with a specialty in Google Cloud, to own and expand our Observability ecosystem into GCP. In this role, you will move beyond simple monitoring to delivering a world class, comprehensive, scalable Observability Platform that enables our SRE teams and business partners. You will treat infrastructure as code —utilizing Terraform and strong coding proficiency in Go, Python, or Ruby —to automate the deployment of agents and collectors across complex distributed systems.

Key Responsibilities

  • Automated Infrastructure: Design, build, and maintain scalable observability infrastructure using tools like Terraform.
  • GCP Observabilty Engineering: Optimize the collection, processing, and storage of Observabilty data to ensure high reliability and low latency of our Splunk and Grafana services
  • Incident Response: Participate in on-call rotations and lead post-incident reviews to drive systemic improvements and "observability-driven development."
  • Automation: Eliminate "toil" by automating the deployment and scaling of observability agents and collectors.

Required Skills & Experience (The Essentials)

GKE: Minimum 5+ Experience scaling and managing observability in a Google Cloud platform. Visualization: Expertise in creating intuitive, actionable Splunk or Grafana dashboards that correlate data across multiple sources. SRE Mindset: Minimum 3+ years of experience in an SRE, DevOps, or Systems Engineering role with a focus on high-availability systems.

  • Programming Proficiency: Strong coding skills in Python , Go for building internal tools and automating workflows.
  • Distributed Systems: Deep understanding of Linux internals, networking (TCP/IP, DNS, Load Balancing), and container orchestration (Kubernetes/GKE).
  • Problem Solving: A data-driven approach to debugging complex, cross-service performance bottlenecks.

Bonus Skills (The "Nice-to-Haves")

  • Telemetry Standards: Hands-on experience with OpenTelemetry (OTel), Vector, or similar frameworks for instrumenting applications.
  • Grafana Loki: Experience in migrating Splunk to Grafana Loki

Other Cloud Platforms: Experience managing observability native tools within AWS.

Additional requirements:

  • This position requires the ability to access federal environments and/or have access to protected federal data.  As a condition of employment for this position, the successful candidate must be able to submit documentation establishing U.S. Person status (e.g. a U.S. Citizen, National, Lawful Permanent Resident, Refugee, or Asylee. 22 CFR 120.15) upon hire.

#LI-MM
#LI-Hybrid

P24517_3387022

Below is the annual base salary range for candidates located in San Francisco Bay Area. Your actual base salary will depend on factors such as your skills, qualifications, experience, and work location. In addition, Okta offers equity (where applicable), bonus, and benefits, including health, dental and vision insurance, 401(k), flexible spending account, and paid leave (including PTO and parental leave) in accordance with our applicable plans and policies. To learn more about our Total Rewards program please visit: .   

The annual base salary range for this position for candidates located in the San Francisco Bay area is between: $194,000—$267,000 USD

The Okta Experience

We are intentional about connection. Our global community, spanning over 20 offices worldwide, is united by a drive to innovate. Your journey begins with an immersive, in-person onboarding experience designed to accelerate your impact and connect you to our mission and team from day one.

Okta is an Equal Opportunity Employer. All qualified applicants will receive consideration for employment without regard to race, color, religion, sex, sexual orientation, gender identity, national origin, ancestry, marital status, age, physical or mental disability, or status as a protected veteran. We also consider for employment qualified applicants with arrest and convictions records, consistent with applicable laws.

If reasonable accommodation is needed to complete any part of the job application, interview process, or onboarding please  use this Form to request an accommodation.

Notice for New York City Applicants & Employees: Okta may use Automated Employment Decision Tools (AEDT), as defined by New York City Local Law 144, that use artificial intelligence, machine learning, or other automated processes to assist in our recruitment and hiring process. In accordance with NYC Local Law 144, if you are an applicant or employee residing in New York City, please  click here to view our full NYC AEDT Notice.
Vacancy posted 11 days ago
Similar jobs that could be interesting for youBased on the Staff Site Reliability Engineer - Observability GCP in New York, NY vacancy
  • $111k - $130k

    QUEST DIAGNOSTICS INC is seeking a Performance II‑Epic to provide reliability engineering services through observability and performance engineering techniques. The role requires collaboration with product owners, ensuring optimal operation through monitoring system performance... 
    Suggested
    Remote job

    QUEST DIAGNOSTICS INC

    Secaucus, NJ
    1 day ago
  • $198.03k - $287.95k

    Calendly is looking for a Site Reliability Engineer to enhance its innovative infrastructure platform. This role will empower teams by enabling best...  ...have robust experience with cloud technologies, especially GCP, and proficiency in Golang or Python development. Successful... 
    Suggested

    Calendly

    New York, NY
    15 hours ago
  •  ...United States is seeking a Sr. Platform Engineer to manage AWS, GCP, and cloud infrastructure. In this role, you will plan monitoring and observability mechanisms, develop tooling in Rust, and ensure operations meet reliability standards. The ideal candidate has 5+ years... 
    Suggested
    Remote job
    Flexible hours

    3Box Labs

    New York, NY
    4 days ago
  •  ...in New York is seeking a motivated SRE/DevOps Engineer to enhance their cloud-based platform. The role requires strong experience in observability and DevOps practices to manage public cloud and ensure system reliability. Candidates should have over 5 years of relevant... 
    Suggested

    Versana LLC.

    New York, NY
    4 days ago
  •  ..., Ditto's peer-to-peer sync engine ensures devices stay connected...  ..., we need experienced Site Reliability Engineers to ensure our infrastructure...  ...specialized team focused on observability, system reliability and...  ...service provider (AWS, GCP, Azure) Strong communication... 
    Suggested
    Remote work
    Flexible hours

    Ditto

    New York, NY
    4 days ago
  • $116.63k - $181.24k

     ...Wikimedia Foundation is looking for a Senior Site Reliability Engineer to join our team, reporting to the Sr...  ...targets. Build and enhance observability systems—metrics, logs, and distributed...  ...cloud‑based systems on AWS, Azure, or GCP with scalability, reliability, and cost... 

    Wikimedia Foundation

    New York, NY
    1 day ago
  • $180k - $200k

     ...and Application Monitoring/Observability: Develop and maintain comprehensive...  ...Agreements (SLAs) to ensure reliable and consistent service...  ...infrastructure‑related issues Software Engineering for Operations: Develop and...  ...platforms such as AWS, GCP, or Azure. Experience with... 
    For contractors
    Work at office
    Work from home
    Flexible hours

    Limelight Health

    New York, NY
    15 hours ago
  • $93.9k - $156.5k

    Hybrid role , 2 days on site. Role is located in NYC with...  ...hours: 9am‑5pm EST. Site Reliability EngineerII (Tuesday‑Saturday...  ...work alongside senior engineers to learn how we observe, monitor, automate, and improve...  ...to Google Cloud Platform (GCP). Collaborate with cross‑... 
    Local area

    CME Chicago Mercantile Exchange Inc.

    New York, NY
    3 days ago
  • $123k - $165k

    Department/Group Overview Our engineering fleet is a horizontal...  ...team provides reliability engineering and operational...  .... We are seeking a Site Reliability Engineer...  ...workflows, enhancing observability, and participating in...  ...platforms - AWS (preferred), GCP, Azure. Proficiency... 

    Disney Cruise Line - The Walt Disney Company

    New York, NY
    1 day ago
  •  ...Senior Sales Engineer At Snowflake, we are powering...  ...how work gets done. Observe by Snowflake is a high-...  ...Ability to travel to client sites and industry events as...  ...platforms (AWS, Azure, GCP). ~ Strong verbal and...  ..., DevOps, and Site Reliability Engineering (SRE) personas... 
    Contract work

    Snowflake

    New York, NY
    1 day ago
  • $170k - $190k

    As a Medrio Senior Site Reliability Engineer, you will be a part of the ITOps group...  ...provider experience (GCP, Azure, AWS, Oracle Cloud)...  ...AI/ML tools for automation, observability, predictive maintenance, and...  ...Wellness: Medrio values our staff’s well-being. To prove it,... 
    Remote job
    Temporary work
    Work from home
    Flexible hours

    Medrio Inc.

    New York, NY
    3 days ago
  • Freelanceshop is looking for a remote SRE Observability Engineer (Datadog Specialist) to enhance our cloud-based platforms. This critical role involves designing monitoring systems to ensure reliability and performance. You will collaborate with various teams to provide... 
    Remote job

    Freelanceshop

    New York, NY
    2 days ago
  • $157.5k - $254.35k

     ...motivated, driven and creative Senior Site Reliability Engineer to join the Site Reliability team....  ...operational risk, drive improvements in observability, incident response, and production...  ...in public cloud (Azure preferred; AWS/GCP acceptable with willingness to learn... 
    Contract work
    Work at office
    Local area
    Remote work

    DocuSign, Inc.

    New York, NY
    1 day ago
  • $130k

    Job Title: Senior Site Reliability Engineer Location: New York City - Hybrid (3 days onsite) Type:...  ...infrastructure ecosystem, production operations, observability, reliability engineering, and...  ...infrastructure and services across GCP, AWS, and OCI, as well as container... 
    Full time

    Yoh Services LLC

    New York, NY
    1 day ago
  •  ...backing of a global organization. As the Site Reliability Engineer, you will help ensure the reliability, scalability, and observability of CloudBlue’s multi-tenant SaaS platforms...  ...preferably with Azure; experience with AWS and/or GCP will also be valued Experience working... 
    Remote work
    Worldwide
    Flexible hours

    HostPapa Inc.

    New York, NY
    4 days ago
  • $143k - $179k

     ...connect with your customers reliably and securely, at every step...  ...We're looking for a Senior Site Reliability Engineer to join our SRE team, the...  ...as Google Cloud Platform (GCP) or Amazon Web Services (AWS...  ...with modern monitoring and observability tools such as Prometheus, Grafana... 
    Remote work
    Flexible hours

    Sinch

    New York, NY
    4 days ago
  • $111k - $130k

     ...your role is to provide reliability engineering services through observability and performance engineering...  ..., and aiding support staff with resolving incidents....  ...efficiency. You will use Site Reliability Engineering practices...  ...AWS/Azure/GCP Certifications Chaos Engineering... 
    Full time
    Part time
    Work experience placement
    Remote work
    Flexible hours

    QUEST DIAGNOSTICS INC

    Secaucus, NJ
    1 day ago
  • $210k - $310k

     ...ecosystem. SDF is looking for a Director of Site Reliability Engineering to lead a small, high-leverage SRE...  ...SDF engineering teams build, deploy, observe, and operate software with confidence....  ...modern cloud infrastructure in AWS, GCP, or similar environments. 3+ years of... 
    Temporary work
    Work at office
    Local area
    Worldwide
    Flexible hours

    Stellar

    New York, NY
    3 days ago
  • $136k - $180k

    As a Staff Site Reliability Engineer, you will be a key technical leader responsible for the architecture...  ...of our Google Cloud Platform (GCP) environment, drive our Infrastructure...  ...serve as the expert for scalability, observability, and building the robust, automated... 
    Remote work

    Kevala Inc.

    New York, NY
    4 days ago
  • $127k

    Position Overview Platform Engineering is the department within SRE...  ...internal service mesh), and observability and alerting systems. The Deployments...  ...infrastructure, ensuring reliable code deployment from...  ...AWS, Google Cloud Platform (GCP), or Azure Understanding of... 
    Local area
    Flexible hours

    The Consulting Solutions

    New York, NY
    1 day ago
  • $165k - $215k

     ...seeking a highly skilled Senior DevOps / Site Reliability Engineer (SRE) to join our globally...  ...in Kubernetes, cloud infrastructure, observability, automation, CI/CD, incident management...  ...cloud infrastructure across OCI, AWS, GCP, or Azure environments. Develop and... 

    Stellar Cyber

    New York, NY
    4 days ago
  •  ...features at a massive scale most engineers never get to touch. We're...  ...SRE who cares deeply about reliability and scalability. The work...  ...Own reliability across our GCP infrastructure: Kubernetes clusters...  ...Build and maintain observability across the stack: metrics, dashboards... 
    Work at office

    General Intuition & Medal

    New York, NY
    2 days ago
  • DroneUp, LLC is hiring an SRE - Platform Engineer in the United States, focusing on the reliability and performance of their IT infrastructure while mentoring teams...  ...while working with cloud technologies such as GCP. Ideal candidates should have strong Kubernetes and... 

    DroneUp, LLC

    New York, NY
    4 days ago
  •  ...leadership role for a senior engineer who can own Zenith’s...  .... You will lead all reliability, performance, and...  ...implement incident response, observability, monitoring, alerting,...  ...7+ years in site reliability engineering...  ...cloud platforms (AWS/GCP/Azure), containerization... 
    Remote work

    Framework Ventures

    New York, NY
    4 days ago
  • $114k - $148k

    OneStream Software is actively seeking a Site Reliability Engineer to join their remote team. In this vital role, you will ensure the reliability, performance, and availability of the platform and services. The ideal candidate will have extensive cloud infrastructure experience... 
    Remote job

    OneStream Software

    New York, NY
    15 hours ago
  • $148.7k - $199.4k

     ...Disney Entertainment & Sports LLC is seeking a Senior Software Engineer - AI and Observability in New York. You will lead the design of AI-driven systems crucial for Disney’s streaming services, ensuring reliability and performance. With a strong background in backend... 

    5014 Disney Entertainment & Sports LLC

    New York, NY
    3 days ago
  •  ...the platforms and tooling that help engineering teams develop, deploy, and operate production...  ...for every product team. As a Staff Site Reliability Engineer on Release Engineering, you...  ...readiness through expertise in observability, incident response, and scalable deployment... 
    Permanent employment
    Work experience placement
    Local area

    Plaid

    New York, NY
    2 days ago
  • $185k - $200k

    Branch Messenger Inc. is seeking a Staff Cloud Operations Engineer to join their Cloud Ops team, focusing...  ...maintaining cloud infrastructure in GCP. This remote role emphasizes...  ...engineering, incident response, and observability tools. The salary range for this position... 
    Remote job

    Branch Messenger Inc.

    New York, NY
    4 days ago
  • Stack AI, Inc. in New York is looking for a Platform Engineer to enhance their API and observability solutions. You will play a crucial role in designing a product-level API, keeping developer experience paramount while implementing analytics for agent performance. This... 

    Stack AI, Inc.

    New York, NY
    3 days ago
  •  ...customer who wants to understand what their agents are doing relies on the analytics you build. We're hiring a Platform Engineer, APIs & Observability to own both: the public API the ecosystem builds on, and the observability and analytics that make a platform full of... 
    Contract work

    Stackai

    New York, NY
    3 days ago

Do you want to receive more vacancies?

Subscribe and receive similar vacancies to Staff Site Reliability Engineer - Observability GCP. Be the first to apply!