Staff Site Reliability Engineer - Observability GCP

$194k - $267k

Okta

Secure Every Identity, from AI to Human

Identity is the key to unlocking the potential of AI. Okta secures AI by building the trusted, neutral infrastructure that enables organizations to safely embrace this new era. This work requires a relentless drive to solve complex challenges with real-world stakes. We are looking for builders and owners who operate with speed and urgency and execute with excellence.

This is an opportunity to do career-defining work. We're all in on this mission. If you are too, let's talk.

We are seeking a highly technical Observability Site Reliability Engineer with a specialty in Google Cloud, to own and expand our Observability ecosystem into GCP. In this role, you will move beyond simple monitoring to delivering a world class, comprehensive, scalable Observability Platform that enables our SRE teams and business partners. You will treat infrastructure as code —utilizing Terraform and strong coding proficiency in Go, Python, or Ruby —to automate the deployment of agents and collectors across complex distributed systems.

Key Responsibilities

Automated Infrastructure: Design, build, and maintain scalable observability infrastructure using tools like Terraform.
GCP Observabilty Engineering: Optimize the collection, processing, and storage of Observabilty data to ensure high reliability and low latency of our Splunk and Grafana services
Incident Response: Participate in on-call rotations and lead post-incident reviews to drive systemic improvements and "observability-driven development."
Automation: Eliminate "toil" by automating the deployment and scaling of observability agents and collectors.

Required Skills & Experience (The Essentials)

GKE: Minimum 5+ Experience scaling and managing observability in a Google Cloud platform. Visualization: Expertise in creating intuitive, actionable Splunk or Grafana dashboards that correlate data across multiple sources. SRE Mindset: Minimum 3+ years of experience in an SRE, DevOps, or Systems Engineering role with a focus on high-availability systems.

Programming Proficiency: Strong coding skills in Python , Go for building internal tools and automating workflows.
Distributed Systems: Deep understanding of Linux internals, networking (TCP/IP, DNS, Load Balancing), and container orchestration (Kubernetes/GKE).
Problem Solving: A data-driven approach to debugging complex, cross-service performance bottlenecks.

Bonus Skills (The "Nice-to-Haves")

Telemetry Standards: Hands-on experience with OpenTelemetry (OTel), Vector, or similar frameworks for instrumenting applications.
Grafana Loki: Experience in migrating Splunk to Grafana Loki

Other Cloud Platforms: Experience managing observability native tools within AWS.

Additional requirements:

This position requires the ability to access federal environments and/or have access to protected federal data. As a condition of employment for this position, the successful candidate must be able to submit documentation establishing U.S. Person status (e.g. a U.S. Citizen, National, Lawful Permanent Resident, Refugee, or Asylee. 22 CFR 120.15) upon hire.

#LI-MM
#LI-Hybrid

P24517_3387022

Below is the annual base salary range for candidates located in San Francisco Bay Area. Your actual base salary will depend on factors such as your skills, qualifications, experience, and work location. In addition, Okta offers equity (where applicable), bonus, and benefits, including health, dental and vision insurance, 401(k), flexible spending account, and paid leave (including PTO and parental leave) in accordance with our applicable plans and policies. To learn more about our Total Rewards program please visit: .

The annual base salary range for this position for candidates located in the San Francisco Bay area is between: $194,000—$267,000 USD

The Okta Experience

We are intentional about connection. Our global community, spanning over 20 offices worldwide, is united by a drive to innovate. Your journey begins with an immersive, in-person onboarding experience designed to accelerate your impact and connect you to our mission and team from day one.

Okta is an Equal Opportunity Employer. All qualified applicants will receive consideration for employment without regard to race, color, religion, sex, sexual orientation, gender identity, national origin, ancestry, marital status, age, physical or mental disability, or status as a protected veteran. We also consider for employment qualified applicants with arrest and convictions records, consistent with applicable laws.

If reasonable accommodation is needed to complete any part of the job application, interview process, or onboarding please use this Form to request an accommodation.

Notice for New York City Applicants & Employees: Okta may use Automated Employment Decision Tools (AEDT), as defined by New York City Local Law 144, that use artificial intelligence, machine learning, or other automated processes to assist in our recruitment and hiring process. In accordance with NYC Local Law 144, if you are an applicant or employee residing in New York City, please click here to view our full NYC AEDT Notice.

Apply

Vacancy posted a month ago

Similar jobs that could be interesting for youBased on the Staff Site Reliability Engineer - Observability GCP in New York, NY vacancy

Staff Site Reliability Engineer - Observability GCP
$194k - $267k
...mission. If you are too, let’s talk. We are seeking a highly technical Observability Site Reliability Engineer with a specialty in Google Cloud to own and expand our Observability ecosystem into GCP. In this role, you will move beyond simple monitoring to delivering a...
Suggested
Permanent employment
Local area
Flexible hours
Okta
New York, NY
9 hours ago
Senior Site Reliability Engineer, Observability
$160k - $200k
...real world value. THE WORK: This is an engineering-first role with a coaching dimension—... ...majority of your time doing hands‑on observability and reliability engineering work: building... ...BRING: Core SRE Experience 7+ years in Site Reliability Engineering, DevOps, or Platform...
Suggested
Full time
Work at office
Local area
Ripple
New York, NY
9 hours ago
Site Reliability Engineer: Scalable Infra & Observability
Scale.jobs seeks an experienced SRE/DevOps engineer to design and operate scalable, multi-region cloud infrastructure... ...operational overhead, and improve system reliability at scale. You will implement robust observability with Prometheus, Grafana/OpenTelemetry, and CI/CD...
Suggested
Scale.jobs
New York, NY
2 days ago
Site Reliability Engineer
$100k - $250k
...Roadmap As a member of Kalshi's engineering team, you'll help build the next-generation... .... What You'll Do Improve observability, reliability, and service availability by defining... ...Terraform Experience with AWS, GCP, or Azure Experience operating in...
Suggested
Local area
Kalshi
New York, NY
4 days ago
Site Reliability Engineer (SRE)
...Site Reliability Engineer (SRE) We are seeking a highly skilled Site Reliability Engineer (SRE)... ...scaling strategies. Build and maintain observability solutions including monitoring,... ...: AWS, Azure, Google Cloud Platform (GCP) Containerization: Docker, Kubernetes...
Suggested
Full time
Remote work
Ova Technologies
New York, NY
1 day ago
Site Reliability Engineer II
$123k - $165k
...Site Reliability Engineer II Our engineering fleet is a horizontal set of teams providing engineering... ...operational workflows, enhancing observability, and participating in incident response... ...cloud platforms – AWS (preferred), GCP, Azure. ~ Proficiency in Python, Go...
Disney France
New York, NY
1 day ago
Senior Site Reliability Engineer, Fleet Management
$127k - $249k
THE TEAM Platform Engineering is the department within SRE that is... ...internal service mesh), and observability and alerting systems. The... ...components that ensure cluster reliability and security (e.g., CoreDNS,... ...platforms, including AWS, GCP, or Azure * Proficiency in...
Work at office
Local area
Remote work
Worldwide
Flexible hours
MongoDB
New York, NY
1 day ago
Site Reliability Engineer (SRE)
$100k - $150k
...to grow, we’re looking for a skilled Site Reliability Engineer (SRE) to join our dynamic team and contribute... ...that promote safe, frequent, and observable releases, supported by automated... ...major cloud platform (AWS, Azure, or GCP). Background in capacity planning,...
Full time
H1b
Local area
Immediate start
Remote work
Visa sponsorship
Work visa
Bright Vision Technologies
New York, NY
9 hours ago
Director, Site Reliability Engineering
$205k - $305k
...Director Of Site Reliability Engineering Interested in working on cutting-edge blockchain technology... ...SDF engineering teams build, deploy, observe, and operate software with confidence.... ...with modern cloud infrastructure in AWS, GCP, or similar environments. ~3+ years...
Temporary work
Work at office
Local area
Worldwide
Flexible hours
Stellar
New York, NY
4 days ago
Lead Site Reliability Engineer
$184.2k - $240k
...Lead Site Reliability Engineer New York, NY, United States Movable Ink scales content personalization for marketers... ...-cloud architecture and strategy (AWS and GCP). Experience architecting and leading large-scale observability platforms, including defining observability...
Movable Ink
New York, NY
9 hours ago
Senior Solutions Engineering
...Senior Sales Engineer At Snowflake, we are powering... ...how work gets done. Observe by Snowflake is a high-... ...Ability to travel to client sites and industry events as... ...platforms (AWS, Azure, GCP). ~ Strong verbal and... ..., DevOps, and Site Reliability Engineering (SRE) personas...
Contract work
Snowflake
New York, NY
2 days ago
Sr. Site Reliability Engineer I
$89k - $178k
...Do Build and maintain the reliability, scalability, and... ...measurement platforms Implement observability best practices, including... ...infrastructure and services across GCP, AWS, OCI, and on‑premises... ...& Skills 4+ years in Site Reliability Engineering, DevOps, or related...
DoubleVerify
New York, NY
5 days ago
Site Reliability Engineer
...latency production services. The engineer in this role will partner... ...is passionate about system observability, chaos engineering, and... ...challenges to guarantee system reliability and performance at scale. Key... ...infrastructure on AWS or GCP using Terraform to ensure high...
Scale.jobs
New York, NY
2 days ago
Site Reliability Engineer
Senior Site Reliability Engineer (SRE / Infrastructure) Role Overview We’re hiring a Senior SRE to build... ...resilient systems, improving observability, and automating operations so engineering... ...cloud infrastructure (primarily AWS/GCP + Linux) Design and operate Kubernetes...
The Cypress Group
New York, NY
9 hours ago
Site Reliability Engineer
$100k - $250k
...can be yours. What you’ll do Improve observability, reliability and availability by defining and... ...Educate, mentor and hold accountable the engineering team to improve the reliability of... ...and Terraform. Experience with AWS, GCP, or Azure. Experience working in a highly...
Local area
Kalshi
New York, NY
5 days ago
Senior Software Engineer - Observability and Reliability
$170k - $240k
Senior Software Engineer - Observability and Reliability New York City, NY About the Role We are growing the engineering team and looking for engineers... ...security models Administered cloud service infrastructure (GCP, AWS, Azure) Startup experience Additional Job details...
Full time
Work at office
Flexible hours
Sigma Computing
New York, NY
9 hours ago
Site Reliability / Infrastructure Engineer
...features at a massive scale most engineers never get to touch. We're... ...SRE who cares deeply about reliability and scalability. The work... ...Own reliability across our GCP infrastructure: Kubernetes clusters... ...scale Build and maintain observability across the stack: metrics,...
Work at office
Medal
New York, NY
5 days ago
Senior DevOps Engineer/Site Reliability Engineer-East Coast
$165k - $215k
...seeking a highly skilled Senior DevOps / Site Reliability Engineer (SRE) to join our globally... ...in Kubernetes, cloud infrastructure, observability, automation, CI/CD, incident management... ...cloud infrastructure across OCI, AWS, GCP, or Azure environments. Develop and...
Stellar Cyber
New York, NY
9 hours ago
Staff Site Reliability Engineer
$150k - $225k
...join us! About the Role Our Site Reliability Engineering team is growing, and we are... ...for a highly experienced Staff Site Reliability Engineer to... ...with cloud platforms (AWS, GCP, or Azure) and Kubernetes.... ...Familiarity with advanced observability (OTEL, continuous profiling...
Local area
AlphaSense, Inc.
New York, NY
5 days ago
Site Reliability Engineer (Senior or Staff)
$127k - $249k
Platform Engineering is the department within SRE that is responsible... ...internal service mesh), and observability and alerting systems. The... ...delivery infrastructure, ensuring reliable code deployment from... ...AWS, Google Cloud Platform (GCP), or Azure Understanding of...
Local area
Worldwide
Flexible hours
MongoDB
New York, NY
5 days ago
Site Reliability Engineer (Senior or Staff), Infrastructure Security
$127k - $249k
...are looking for an experienced Senior or Staff Engineer for our SRE, InfraSec team, to guide... ...solutions for cloud platforms (AWS, Azure, GCP), including network and compute... ...areas, such as runtime scanning, security observability, CSPM, and more Cloud Expertise: Strong...
Local area
Remote work
Flexible hours
MongoDB
New York, NY
9 hours ago
Staff AI Platform Engineer - Observability & SDK
$200k - $220k
...journey. About The Role We’re hiring a Staff Engineer, AI Platform to build the shared... ...foundations that make AI systems at Pivotal reliable, observable, and easy to adopt across the company.... ...backend frameworks experience with GCP or comparable cloud infrastructure experience...
Remote work
Flexible hours
Pivotal Health
New York, NY
5 days ago
Site Reliability Engineer III- Production Management
...world's most complex and mission-critical systems. As a Site Reliability Engineer III at JPMorgan Chase within the Commercial & Investment... ...coordination when issues emerge. Enhance production observability and reliability by improving instrumentation, dashboards,...
Shift work
J.P. Morgan
New York, NY
3 days ago
Senior GCP Cloud Engineer
$135k - $195k
...message the job poster from KDR Talent Solutions USA Specialist GCP Data Engineering Recruiter- Expert in helping companies mature and scale... ...that streamline code deployment and ensure security and observability by default. Enable AI & ML Platform s – Support the...
Permanent employment
Full time
Remote work
KDR Talent Solutions USA
New York, NY
3 days ago
Senior AI Platform Security Engineer - GCP Lead (Hybrid)
...looking for a Senior AI Platform Security Engineer who lives on GCP and can own the security architecture... ...), Azure (familiarity acceptable)Observability: Elastic SIEM (primary), SCC, Cribl... ...benefit options or visit our NYL Benefits Site.Our Commitment to InclusionAt New...
Local area
New-York-Life
New York, NY
5 days ago
Lead Site Reliability Engineer
...professionals for this role. JOB DESCRIPTION As a Site Reliability Engineering at JPMorgan Chase within the Enterprise technology, liquidity... ...event-driven architecture (Kafka or equivalent) , and observability/telemetry with OpenTelemetry Preferred qualifications...
J.P. Morgan
New York, NY
15 hours ago
Senior Manager of Site Reliability Engineering - Securitized Products, Production Management - NA
...ownership. As a Senior Manager of Site Reliability Engineering at JPMorgan Chase within the... ..., incident trends) Own and evolve observability (dashboards/alerts/SLOs, instrumentation... ...capacity, including guiding support staff and influencing senior engineers and...
Bank staff
Shift work
J.P. Morgan
New York, NY
15 days ago
GCP Cloud Architect
$82.08k - $193.44k
GCP Cloud Architect Capgemini is seeking an experienced GCP Cloud Architect to lead... ...leveraging GCP services such as Compute Engine, BigQuery, Cloud Storage, Pub/Sub, Cloud... ...scaling. Leverage tools like Stackdriver for observability, logging, and monitoring. Design cloud...
Full time
Local area
Capgemini
New York, NY
5 days ago
Staff PM, Observability Platform: Shape Telemetry Economics
United States Digital Space LLC is seeking a Staff Product Manager for our Observability Data Platform (ODP). You will lead product management for foundational... ...across quarters. You will collaborate with engineering, pricing, and finance to size work, forecast impact,...
United States Digital Space LLC
New York, NY
2 days ago
Staff SWE: Logs & Observability Pipelines
Datadog seeks a Staff Software Engineer in New York, NY to enhance their BYOC Logs offering. You will lead the architecture and delivery of systems that process high-volume observability data. This role requires experience in software for customer-deployed environments...
Datadog
New York, NY
9 hours ago

Do you want to receive more vacancies?

Subscribe and receive similar vacancies to Staff Site Reliability Engineer - Observability GCP. Be the first to apply!