Senior Site Reliability Engineer
Okta, Inc.
Secure Every Identity, from AI to Human Identity is the key to unlocking the potential of AI. Okta secures AI by building the trusted, neutral infrastructure that enables organizations to safely embrace this new era. This work requires a relentless drive to solve complex challenges with real-world stakes. We are looking for builders and owners who operate with speed and urgency and execute with excellence. This is an opportunity to do career-defining work. We're all in on this mission. If you are too, let's talk. Get to know Okta Okta is The World's Identity Company. We free everyone to safely use any technology-anywhere, on any device or app. Our Workforce and Customer Identity Clouds enable secure yet flexible access, authentication, and automation that transforms how people move through the digital world, putting Identity at the heart of business security and growth.
At Okta, we celebrate a variety of perspectives and experiences. We are not looking for someone who checks every single box, we're looking for lifelong learners and people who can make us better with their unique experiences.
Join our team! We're building a world where Identity belongs to you.
The Engineering Opportunity We are looking for an experienced Senior Site Reliability Engineer to join Okta's Emerging Products Group (EPG). Our mission is to build highly reliable, scalable, and secure cloud services that our customers can trust. We embrace an automation-first mindset and continuously invest in platform engineering, observability, and operational excellence to enable our engineering teams to move quickly and safely. This role is ideal for an experienced Site Reliability Engineer who enjoys solving complex technical challenges at scale, building automation, and improving the reliability of production systems. You will serve as a key contributor within the EPG SRE organization, partnering closely with software engineers, architects, and product teams to design, build, and operate world-class cloud services. The ideal candidate exemplifies the philosophy of "if you have to do it more than once, automate it" and possesses a strong passion for continuous improvement, operational excellence, and software engineering. What You'll Be Doing
Reliability & Operations
Technical Excellence
#P22403 The Okta Experience
We are intentional about connection. Our global community, spanning over 20 offices worldwide, is united by a drive to innovate. Your journey begins with an immersive, in-person onboarding experience designed to accelerate your impact and connect you to our mission and team from day one. Okta is an Equal Opportunity Employer. All qualified applicants will receive consideration for employment without regard to race, color, religion, sex, sexual orientation, gender identity, national origin, ancestry, marital status, age, physical or mental disability, or status as a protected veteran. We also consider for employment qualified applicants with arrest and convictions records, consistent with applicable laws. If reasonable accommodation is needed to complete any part of the job application, interview process, or onboarding please use this Form to request an accommodation. Notice for New York City Applicants & Employees: Okta may use Automated Employment Decision Tools (AEDT), as defined by New York City Local Law 144, that use artificial intelligence, machine learning, or other automated processes to assist in our recruitment and hiring process. In accordance with NYC Local Law 144, if you are an applicant or employee residing in New York City, please click here to view our full NYC AEDT Notice.
At Okta, we celebrate a variety of perspectives and experiences. We are not looking for someone who checks every single box, we're looking for lifelong learners and people who can make us better with their unique experiences.
Join our team! We're building a world where Identity belongs to you.
The Engineering Opportunity We are looking for an experienced Senior Site Reliability Engineer to join Okta's Emerging Products Group (EPG). Our mission is to build highly reliable, scalable, and secure cloud services that our customers can trust. We embrace an automation-first mindset and continuously invest in platform engineering, observability, and operational excellence to enable our engineering teams to move quickly and safely. This role is ideal for an experienced Site Reliability Engineer who enjoys solving complex technical challenges at scale, building automation, and improving the reliability of production systems. You will serve as a key contributor within the EPG SRE organization, partnering closely with software engineers, architects, and product teams to design, build, and operate world-class cloud services. The ideal candidate exemplifies the philosophy of "if you have to do it more than once, automate it" and possesses a strong passion for continuous improvement, operational excellence, and software engineering. What You'll Be Doing
Reliability & Operations
- Design, build, and operate large-scale cloud infrastructure and production services.
- Participate in an on-call rotation supporting highly available customer-facing systems.
- Lead incident response efforts and drive post-incident reviews focused on systemic improvements.
- Define, measure, and improve Service Level Indicators (SLIs), Service Level Objectives (SLOs), and error budgets.
- Partner with engineering teams to improve service availability, scalability, performance, and resilience.
- Continuously improve observability through metrics, logging, tracing, dashboards, and alerting.
- Develop software, automation, and infrastructure using Go, Python, Terraform, and related technologies.
- Eliminate operational toil through automation, tooling, and platform engineering.
- Improve deployment safety and operational workflows through CI/CD and GitOps practices.
- Collaborate on modernizing existing workloads and aligning them with evolving platform capabilities.
- Build self-service platforms, operational guardrails, and automation that improve developer velocity while maintaining reliability and security.
- Contribute to and drive reliability initiatives within the product group.
- Guide engineers in adopting operational best practices and reliability engineering principles.
- Mentor engineers through technical collaboration, design reviews, incident analysis, and knowledge sharing.
- Support architecture and operational decisions through data-driven recommendations and engineering expertise.
- Execute projects from conception through production rollout and long-term operational ownership.
- Explore and apply AI-assisted engineering techniques to improve operational efficiency, incident response, troubleshooting, and automation.
- Identify opportunities to leverage emerging technologies to reduce toil and improve engineering productivity.
- Infrastructure/Orchestration: Kubernetes (EKS/GKE), Terraform, Helm, Git, ArgoCD, GitOps
- Programming: Golang, Python
- Observability: Datadog, Splunk
- Data Stores: PostgreSQL, Redis, OpenSearch
Technical Excellence
- Strong experience operating large-scale production services in AWS and/or GCP.
- Deep expertise with Kubernetes in production environments.
- Experience troubleshooting Kubernetes networking, storage, scheduling, scaling, and workload lifecycle issues.
- Extensive experience with Infrastructure as Code technologies such as Terraform and Helm.
- Strong software engineering skills in Golang and/or Python.
- Experience building automation and internal engineering platforms.
- Experience operating and troubleshooting distributed data platforms such as PostgreSQL, Redis, OpenSearch, MySQL, Cassandra, or similar technologies.
- Strong understanding of cloud networking fundamentals including DNS, load balancing, ingress, TLS, service networking, and traffic management.
- Experience with observability platforms, monitoring strategies, and production telemetry.
- Experience with or strong interest in AI-assisted engineering and operational automation.
- Strong expertise operating customer-facing production systems.
- Experience leading incident response and driving operational improvements.
- Deep understanding of reliability engineering concepts including SLIs, SLOs, error budgets, and capacity planning.
- Strong understanding of CI/CD pipelines, deployment strategies, and automation-first operational practices.
- Proven ability to balance reliability, scalability, security, and engineering velocity.
- Understanding of cloud security fundamentals, IAM, secrets management, and secure infrastructure design.
- Experience implementing operational controls and best practices in regulated or security-sensitive environments is a plus.
- Demonstrated experience contributing to complex engineering initiatives.
- Strong collaboration and communication skills.
- Experience working effectively within globally distributed engineering organizations spanning multiple timezones and cultures.
- Experience mentoring engineers and elevating technical capabilities within an organization.
- Ability to collaborate on technical direction through expertise, partnership, and execution.
- Experience operating SaaS platforms serving large-scale customer workloads.
- Experience working within Kubernetes-based microservices environments.
- Experience supporting globally distributed production environments.
- Experience with GitOps and ArgoCD.
- Experience implementing AI-assisted operational tooling or automation workflows.
#P22403 The Okta Experience
- Supporting Your Well-Being
- Driving Social Impact
- Developing Talent and Fostering Connection + Community
We are intentional about connection. Our global community, spanning over 20 offices worldwide, is united by a drive to innovate. Your journey begins with an immersive, in-person onboarding experience designed to accelerate your impact and connect you to our mission and team from day one. Okta is an Equal Opportunity Employer. All qualified applicants will receive consideration for employment without regard to race, color, religion, sex, sexual orientation, gender identity, national origin, ancestry, marital status, age, physical or mental disability, or status as a protected veteran. We also consider for employment qualified applicants with arrest and convictions records, consistent with applicable laws. If reasonable accommodation is needed to complete any part of the job application, interview process, or onboarding please use this Form to request an accommodation. Notice for New York City Applicants & Employees: Okta may use Automated Employment Decision Tools (AEDT), as defined by New York City Local Law 144, that use artificial intelligence, machine learning, or other automated processes to assist in our recruitment and hiring process. In accordance with NYC Local Law 144, if you are an applicant or employee residing in New York City, please click here to view our full NYC AEDT Notice.
Vacancy posted 2 days ago
Similar jobs that could be interesting for youBased on the Senior Site Reliability Engineer in San Francisco, CA vacancy
$181.69k - $213.75k
...Senior Site Reliability Engineer San Francisco, California; Santa Clara, California; Seattle, WA The Company You'll Join Carta connects founders, investors, and limited partners through world-class software, purpose-built for everyone in venture capital, private...SeniorFull timeWork at office$195k - $240k
...Senior Site Reliability Engineer San Francisco (Hybrid) At You.com, we are building the AI Search Infrastructure that powers modern AI systems. Our goal is to create the trusted knowledge layer that agents, applications, and enterprises rely on to retrieve real-...SeniorFull timeImmediate startRemote workWork from homeFlexible hours$127k - $249k
...The Team Platform Engineering is the department within SRE that is responsible for a range of critical infrastructure and operational... ...fleet, alongside the critical components that ensure cluster reliability and security (e.g., CoreDNS, cert-manager, and Gatekeeper)....SeniorWork at officeLocal areaRemote workWorldwideFlexible hours$159.2k - $301.6k
...running Graphs on the cloud. In this reliability-focused role, you will own the availability... .... You'll partner with the backend engineers building these APIs to make sure the system... ...Science. ~5-10 years of experience in site reliability engineering, infrastructure,...SeniorTemporary workLocal areaWorldwide$166.9k - $225.9k
...Summary: Drata's SRE team operates as both a central engineering function and an embedded reliability practice. You'll be part of a close-knit SRE team... ...What you'll bring: ~6+ years of experience in Site Reliability Engineering, Cloud Engineering, or building...SeniorWork at officeImmediate startWorldwideMonday to FridayFlexible hours$220k - $235k
...Staff/Senior Staff Site Reliability Engineer Ironclad is the leading AI contracting platform that transforms agreements into assets. Contracts move faster, insights surface instantly, and agents push work forward, all with you in control. Whether you're buying or selling...SeniorFull timeContract workWork at office$181k - $263k
...and supporting deployments of global products, and providing first line operational support. We are looking for a Senior Staff Site Reliability Engineer who will set the technical direction for reliability engineering across LiveRamp's global infrastructure. This is a...SeniorWork from homeFlexible hoursNight shift- US Corp. is seeking a Lead Site Reliability Engineer to spearhead our mission of delivering highly available and performant systems. With an average of over 12 years of industry experience, the successful candidate will bridge the gap between software development and systems...Senior
- OutSystems, Inc. is looking for a Site Reliability Engineer to join their team in San Francisco, CA. The ideal candidate will lead the onboarding of services and teams to reliability tenets while establishing SLOs and SLAs. Proficiency in Python and experience with Kubernetes...SeniorFlexible hours
- ...co‑founders with PhDs in AI, Math, and Computer Science — is poised to redefine computing. About the Role We're seeking a Site Reliability Engineer to ensure Hyperbolic's GPU marketplace and AI infrastructure operate with exceptional reliability, performance, and...Senior
$300k
...thousands of H100s, H200s, and B200s, ready for experimentation, full-scale model training, or inference. As a Platform Engineer/Senior Site Reliability Engineer, you’ll own the reliability, performance, and automation of this GPU-powered infrastructure, ensuring...Senior- ...work from home day is currently Tuesday. Engineering at Lambda is responsible for building... ...observability adoptable and improve product reliability. Lead members of other engineering teams... ...in Go Have 5+ years of experience in Site Reliability Engineering practices Possess...SeniorWork at officeLocal areaWork from home
- ...Responsibilities Lead and onboard services and teams to the reliability tenets. Establish and maintain Service Level Objectives (... ...Science or equivalent. 6+ years of experience in Site Reliability Engineering, managing infrastructure and services at scale. History of...Senior
- ...alongside clinicians to make that possible. We’re a team of doctors, engineers, designers, researchers, and creatives building tools that... ...for leading incidents end-to-end. Improve operational reliability: Identify recurring issues and reliability risks, and drive fixes...SeniorWork at officeWorldwide
- What you’ll do As a Senior Site Reliability Engineer, you’ll work closely with product teams in Spend to deliver and maintain scalable, reliable cloud infrastructure in support of key product initiatives. Aligned to the roadmap, you’ll lead on infrastructure design and...Senior
$140k - $220k
About the Job You’ll own reliability and operational excellence for Pylon's production systems. This means designing and implementing... ...scale as we grow. You'll build tooling that makes the entire engineering team more effective, establish on-call rotations and runbooks...Senior- ...acquisition, and Connor was a machine learning research engineer at Scale AI. The rest of our team comes from... ...redefining go-to-market with state-of-the-art AI. As a Senior SRE, you'll tackle the scaling and reliability challenges that come with adding terabytes of data...Senior
- We are seeking a Sr. Site Reliability Engineer to join our team and run critical infrastructure for our blockchain and web applications. You’ll learn to deploy and maintain a fleet of RPC and validator nodes for multiple blockchain networks. You’ll also provide guidance...SeniorRemote job
- ...about this role, we encourage you to apply. The Role As a Senior Platform Engineer, you are a champion for DevOps and SRE culture and... ...goals are met. What You Will Be Doing Improving production reliability and system resilience within an SRE scoped team Championing...SeniorFlexible hours
$60 per hour
Senior Site Reliability Engineer (Copy) Seattle Hybrid (Hybrid location). Full-time. About Us Supio is a trusted AI platform purpose-built for law firms, reshaping how data drives impactful outcomes. Our innovative approach blends technology with deep legal expertise,...SeniorFull timeWork at officeFlexible hours- For more information, please read ourSenior Site Reliability Engineer page is loaded## Senior Site Reliability Engineerlocations: US - San Francisco Bay Areatime type: Full timeposted on: Posted Yesterdayjob requisition id: R1478**There are NO limits to your career: come...SeniorImmediate startRemote workWorldwide
$175k - $250k
...00.00/yr - $250,000.00/yr Job Title: Senior Cloud Infrastructure Engineer Location: San Francisco, CA. Remote unavailable. Modality: On-Site only. Must live within commuting distance... ...scalability, performance, and reliability across environments. What You’ll Do...SeniorFull timeRemote workRelocationRelocation package$50 per hour
...years of professional SRE experience 5+ years of experience contributing to architecture and design (architecture, design patterns, reliability and scaling) of new and current systems Bachelor's Degree in Computer Science or related field, or 8+ years relevant work...SeniorTemporary workWork experience placement$117k - $209.33k
Position Overview Want to help make a better world? As a Senior Site Reliability Engineer at Autodesk, you will build and operate reliable, secure, and scalable cloud services for Autodesk GovCloud products. This foundational role helps establish the operating model, reliability...Senior$166.9k - $225.9k
Job Summary Drata's SRE team operates as both a central engineering function and an embedded reliability practice. You'll be part of a close-knit SRE team... ...organization. What you’ll bring 6+ years of experience in Site Reliability Engineering, Cloud Engineering, or...SeniorFlexible hours$165k - $241.4k
...efficient, functional and very effective. We’re looking for talented engineers with a software or operations background, experienced in... ...closely with our application development teams to ensure the reliability, performance and security of our infrastructure....SeniorFull timeTemporary workWork at officeFlexible hours1 day per week- CloudDevs: Senior Web site Reliability Engineer (SRE) CloudDevs works with fast-moving, venture-backed startups throughout the US. We’re constructing a pool of world-class Web site Reliability Engineers for present roles and for upcoming alternatives. You’ll both be positioned...Senior
$266k - $398k
...Director, Site Reliability Engineering – Infrastructure Platform Okta is The World’s Identity Company. Okta provides secure access, authentication, and automation, placing identity at the core of business security and growth. The Infrastructure Platform and Shared...SeniorPermanent employmentFlexible hours- Senior Site Reliability Engineer - AI Infrastructure Location: Global Remote / San Francisco • Full-Time About Andromeda Andromeda Cluster was founded by Nat Friedman and Daniel Gross to give early‑stage startups access to the kind of scaled AI infrastructure once reserved...SeniorFull timeRemote work
- Drata is seeking a Senior Site Reliability Engineer in San Francisco. In this role, you will engage in reliability architecture for product teams, lead production readiness reviews, and build automation around monitoring and alerting. The ideal candidate has at least 6...Senior
Do you want to receive more vacancies?
Subscribe and receive similar vacancies to Senior Site Reliability Engineer. Be the first to apply!
Related searches
- site reliability engineer remote San Francisco, CA
- site reliability engineer sre San Francisco, CA
- site reliability engineer San Francisco, CA
- senior cloud service delivery manager San Francisco, CA
- senior business analyst contract San Francisco, CA
- senior product design engineer San Francisco, CA
- senior game producer San Francisco, CA
- senior software manager San Francisco, CA
- senior manager business analytics San Francisco, CA
- senior marketing account manager San Francisco, CA

