Sign up to access all features of our service.
  • Job search
  • Favorites
  • Create a CV
    New
  • Salaries
  • Subscriptions

Staff Site Reliability Engineer - Kubernetes

$194k - $267k

Okta, Inc.

Secure Every Identity, from AI to Human

Identity is the key to unlocking the potential of AI. Okta secures AI by building the trusted, neutral infrastructure that enables organizations to safely embrace this new era. This work requires a relentless drive to solve complex challenges with real-world stakes. We are looking for builders and owners who operate with speed and urgency and execute with excellence.

This is an opportunity to do career-defining work. We're all in on this mission. If you are too, let's talk.

Workforce Identity Cloud

Okta Workforce Identity Cloud (WIC) provides easy, secure access for your workforce so you can focus on other strategic priorities-like reducing costs, and doing more for your customers.

If you like to be challenged and have a passion for solving large-scale automation, testing, and tuning problems, we would love to hear from you. The ideal candidate is someone who exemplifies the ethics of, "If you have to do something more than once, automate it" and who can rapidly self-educate on new concepts and tools.

Position Overview:

The Site Reliability Engineer (SRE) will play a key role in building and managing Kubernetes platforms that support cloud-native applications and services. This position focuses on architecting and managing reliable, scalable, and secure Kubernetes-based platforms on AWS, ensuring high availability and performance while optimizing costs and automation. The ideal candidate will have hands-on experience with AWS infrastructure, Kubernetes platform creation, Helm charts, Karpenter scaling, and Istio service mesh.

Key Responsibilities:
  • Kubernetes Platform Creation: Design, implement, and maintain highly available, scalable, and fault-tolerant Kubernetes platforms. Ensure clusters are optimized for production workloads, providing high resilience and operational efficiency.
  • AWS Infrastructure Management: Build, manage, and optimize AWS cloud infrastructure, including EKS,ECS, S3, VPCs, RDS, IAM, and more. Implement best practices for cost management, scaling, and security within AWS.
  • Helm Management: Utilize Helm to automate and streamline the deployment of applications and services to Kubernetes clusters. Create, maintain, and manage Helm charts for production-ready deployments.
  • Karpenter Implementation: Implement and manage Karpenter to dynamically scale Kubernetes clusters in response to workload demands.
  • Istio Service Mesh Management: Configure and manage Istio to provide service-to-service communication, security, and observability within the Kubernetes clusters. Enable fine-grained traffic management, service discovery, and policy enforcement.
  • Platform Automation & Scaling: Automate the deployment, scaling, and management of infrastructure and applications. Work with CI/CD pipelines to ensure a seamless flow from development to production with minimal downtime.
  • Incident Management & Troubleshooting: Respond to incidents, troubleshoot, and resolve system issues related to performance, availability, and security in a timely and effective manner.
  • Security & Compliance: Design and implement secure cloud infrastructure with appropriate access controls, network security, and compliance frameworks.
  • Documentation & Knowledge Sharing: Create and maintain detailed documentation for Kubernetes platform setup, operational procedures, and best practices. Promote knowledge sharing across teams.



Required Qualifications:
  • 4+ years of experience with Kubernetes/Helm;
  • 4+ years of Experience with Terraform.
  • 5+ years of Experience with AWS
  • Experience with multi-region cloud environments.
  • Proven experience with AWS (EC2, RDS, S3, CloudFormation, IAM, etc.) and solid understanding of cloud-native architectures.
  • Strong expertise in Kubernetes platform creation, management, and optimisation (e.g., setting up highly available clusters, networking, and storage).
  • Hands-on experience with Helm for Kubernetes application deployment and management.
  • Practical experience with Karpenter for dynamic scaling of Kubernetes clusters and optimising resource usage.
    Expertise in managing and securing Istio for service mesh, including traffic management, security, and observability features.
  • Proficiency in CI/CD pipelines and automation tools (e.g., Jenkins, GitLab, CircleCI, Terraform, Ansible, Spinnaker).
    Strong scripting and automation skills in Python, Bash, or Go for infrastructure management and platform automation.
  • Experience with monitoring, logging, and alerting tools such as Prometheus, Grafana, CloudWatch, and ELK Stack.



Preferred Qualifications:
  • Understanding of security best practices for cloud platforms and Kubernetes (e.g., role-based access control (RBAC), encryption, and compliance frameworks).
  • Familiarity with Docker and containerization principles.
  • Bachelor's degree in Computer Science, Engineering, or related field (or equivalent professional experience).
  • Certifications (Preferred): CKA (Certified Kubernetes Administrator), CKAD (Certified Kubernetes Application Developer), or AWS Certified DevOps Engineer are highly desirable.


Additional requirements:

  • This position requires the ability to access federal environments and/or have access to protected federal data. As a condition of employment for this position, the successful candidate must be able to submit documentation establishing U.S. Person status (e.g. a U.S. Citizen, National, Lawful Permanent Resident, Refugee, or Asylee. 22 CFR 120.15) upon hire.
  • Requires in-person onboarding and travel to our San Francisco, CA HQ office or our Chicago office during the first week of employment.


#LI-Hybrid

#LI-LSS1

requisition ID- (P16373_3396241)

The annual base salary range for this position for candidates located in the San Francisco Bay area is between: $194,000—$267,000 USD

Below is the annual base salary range for candidates located in California (excluding San Francisco Bay Area), Colorado, Illinois, New York and Washington. Your actual base salary will depend on factors such as your skills, qualifications, experience, and work location. In addition, Okta offers equity (where applicable), bonus, and benefits, including health, dental and vision insurance, 401(k), flexible spending account, and paid leave (including PTO and parental leave) in accordance with our applicable plans and policies. To learn more about our Total Rewards program please visit:

The annual base salary range for this position for candidates located in California (excluding San Francisco Bay Area), Colorado, Illinois, New York, and Washington is between: $174,000—$214,000 USD

The Okta Experience

  • Supporting Your Well-Being
  • Driving Social Impact
  • Developing Talent and Fostering Connection + Community


We are intentional about connection. Our global community, spanning over 20 offices worldwide, is united by a drive to innovate. Your journey begins with an immersive, in-person onboarding experience designed to accelerate your impact and connect you to our mission and team from day one.

Okta is an Equal Opportunity Employer. All qualified applicants will receive consideration for employment without regard to race, color, religion, sex, sexual orientation, gender identity, national origin, ancestry, marital status, age, physical or mental disability, or status as a protected veteran. We also consider for employment qualified applicants with arrest and convictions records, consistent with applicable laws.

If reasonable accommodation is needed to complete any part of the job application, interview process, or onboarding pleaseuse this Form to request an accommodation.

Notice for New York City Applicants & Employees: Okta may use Automated Employment Decision Tools (AEDT), as defined by New York City Local Law 144, that use artificial intelligence, machine learning, or other automated processes to assist in our recruitment and hiring process. In accordance with NYC Local Law 144, if you are an applicant or employee residing in New York City, pleaseclick here to view our full NYC AEDT Notice.
Vacancy posted 1 day ago
Similar jobs that could be interesting for youBased on the Staff Site Reliability Engineer - Kubernetes in San Francisco, CA vacancy
  • $125k - $165k

    A leading innovator in laboratory software is seeking a Site Reliability Engineer in San Francisco, CA. The role focuses on ensuring reliability...  ...with distributed systems and technologies like AWS and Kubernetes. The position offers a competitive salary range of $125,00... 
    Suggested

    TELCOR

    San Francisco, CA
    3 days ago
  •  ...language learning platform is seeking an experienced SRE Engineer to ensure the reliability and resilience of their infrastructure. Responsibilities...  ...in SRE or related fields, particularly with GCP and Kubernetes, and a proven record in managing high-traffic systems. #J... 
    Suggested

    Speak

    San Francisco, CA
    3 days ago
  • $150k - $200k

     ...building a next-generation platform that abstracts Kubernetes complexity while preserving its full power. As a Platform Engineer, you'll work closely with the founding team to...  ...sponsorship available Location This is an on-site role based in San Francisco, CA . Candidates... 
    Suggested
    Visa sponsorship

    Clera

    San Francisco, CA
    3 days ago
  • $106k - $130k

     ...generation of application infrastructure and to be responsible for reliability, automation and scalability using and the latest best...  ...on TCP/UDP/IP protocols Working knowledge of AWS, Docker, Kubernetes, Swarm Employee must be able to perform essential functions... 
    Suggested
    Hourly pay
    Work experience placement
    Work at office
    Immediate start
    Visa sponsorship
    Work visa
    Flexible hours

    Early Warning Services

    San Francisco, CA
    3 days ago
  • $230k - $310k

     ...daily users while enabling our engineering teams to ship fast. You'll...  ...and tooling that improves reliability and partnering with engineering...  ...'ll Bring ~5+ years in site reliability engineering,...  ..., containerization (Docker, Kubernetes), and database performance at... 
    Suggested
    Full time
    Work at office
    Work from home

    Gamma

    San Francisco, CA
    3 days ago
  • $260k - $300k

     ...makers of Devin, the first AI software engineer, and Windsurf, the AI-native IDE. Together...  .... You will own both the production reliability of our user-facing products and the...  ...GCP, or Azure), container orchestration (Kubernetes), and infrastructure as code (Terraform... 

    Cognition Corp

    San Francisco, CA
    4 days ago
  •  ...redefine computing. About the Role We're seeking a Site Reliability Engineer to ensure Hyperbolic's GPU marketplace and AI...  ...Are Architected, deployed, and managed large-scale Kubernetes environments, including cluster administration, container... 

    Hyperbolic Labs

    San Francisco, CA
    4 days ago
  • $170k - $230k

     ...Site Reliability Engineer (SRE) Palo Alto / San Francisco Bay Area About Mithril Mithril is an AI infrastructure platform built to make...  ...Develop and maintain Terraform/Pulumi modules and Kubernetes configurations to manage Mithril's growing multi-cloud provider... 
    Work at office
    Local area
    1 day per week

    Mithril

    San Francisco, CA
    4 days ago
  • $150k

     ...Site Reliability Engineer San Francisco, CA About The Role We are seeking an experienced Site Reliability Engineer (SRE) with a strong...  ...Professional). Experience with container security and Kubernetes (EKS) hardening. Familiarity with CSPM tools (e.g., Wiz... 

    VantageScore®

    San Francisco, CA
    23 hours ago
  •  ...DESCRIPTION Project Outline: We are looking for a Site Reliability Engineer with experience in incident response. In this role, you will...  ...Systems Design: Experience with container orchestration (Kubernetes) and cloud infrastructure (GCP). Experience... 

    BayOne Solutions

    San Francisco, CA
    1 day ago
  • $117k - $209.33k

     ...Overview Want to help make a better world? As a Senior Site Reliability Engineer at Autodesk, you can help us build and operate reliable, secure...  ...and security requirements ~ Experience with containers, Kubernetes, cloud-native architectures, APIs, load balancing,... 
    For contractors

    Autodesk

    San Francisco, CA
    23 hours ago
  •  ...landscape. The Role You'll be the infrastructure and reliability engineer on the Data Replication team - a full-stack product team running...  ...tooling. You're equally comfortable in a Terraform file, a Kubernetes cluster, and a postmortem doc. We expect engineers here... 
    Local area

    Airbyte

    San Francisco, CA
    23 hours ago
  •  ...Open Source LLM Gateway Engineer LiteLLM is an open-source LLM Gateway with 34K+ stars...  ...our 6th Engineer focused on owning reliability, performance, and infrastructure stability...  ...FastAPI, Redis, Postgres, Prisma ORM, Kubernetes, Prometheus, Docker. Who we are looking... 

    BerriAI

    San Francisco, CA
    23 hours ago
  •  ...our manifesto. About the Role We're looking for a Site Reliability Engineer to take the lead on scaling our operational resilience as...  ...dive into unfamiliar backend codebases ~ Strong Go and Kubernetes experience. ~ Familiarity with observability and monitoring... 
    Worldwide
    Shift work

    Happy Robot

    San Francisco, CA
    1 day ago
  •  ...Udaip Cloud-Based Data And Ai Platform Engineer At U.S. Bank, we're on a journey to do our best. Helping the customers and businesses...  ...Experience with containerization, specifically Docker and Kubernetes Experience with monitoring and logging tools such as CloudWatch... 
    Temporary work
    Work experience placement

    Phenom People

    San Francisco, CA
    4 days ago
  •  ...About the job Senior Site Reliability Engineer About the Company Stellar is a decentralized, public blockchain that gives developers the...  ...CI/CD tooling. Build, maintain, monitor and improve our Kubernetes clusters. Work with development teams on migrating... 

    TechChain Talent

    San Francisco, CA
    23 hours ago
  • $166.9k - $225.9k

     ...operates as both a central engineering function and an embedded reliability practice. You'll be part...  ...engineering leads and staff engineers to define SLOs...  ...years of experience in Site Reliability Engineering,...  ...AWS ECS Fargate and/or Kubernetes ~ Experience working with... 
    Work at office
    Immediate start
    Worldwide
    Monday to Friday
    Flexible hours

    Drata Inc

    San Francisco, CA
    1 day ago
  •  ...Site Reliability Engineer 3 We are looking for a Site Reliability Engineer 3 to support mission-critical cloud services and production operations...  ...Languages: Python, Java, Bash Orchestration: Kubernetes, Helm CI/CD: Jenkins Observability: Grafana, Prometheus... 
    Immediate start
    Flexible hours
    Shift work

    Oracle

    San Francisco, CA
    12 hours ago
  •  ...Site Reliability Engineer (SRE) FLUIX is building the AI operating system that plans, designs, and optimizes AI infrastructure. We are based...  ...experience with containerization and orchestration tools (Kubernetes), a strong understanding of networking, security, and... 
    Work at office
    Weekend work

    Fluix AI

    San Francisco, CA
    4 days ago
  • $200k - $300k

     ...Site Reliability Engineer Title of Role: Site Reliability Engineer Location: San Francisco, onsite Company Stage of Funding: Venture...  ...applications in a fast-paced environment. Manage and optimize Kubernetes clusters for high availability and performance. Develop... 
    Work at office

    Recruiting from Scratch

    San Francisco, CA
    4 days ago
  •  ...About the Role We're looking for an experienced Site Reliability Engineer (SRE) to help us scale our platform with reliability, observability...  ...with containerization (Docker), and orchestration (Kubernetes) ~ Strong knowledge of Linux systems, networking, and systems... 

    Alembic Limited

    San Francisco, CA
    3 days ago
  • $350k

     ...Site Reliability Engineer (SRE) San Francisco Thinking Machines Lab's mission is to empower humanity through advancing collaborative general...  ...for long-running distributed jobs. Expertise in Kubernetes at scale: deploying, operating, debugging, and tuning clusters... 
    Local area
    Visa sponsorship
    Work visa
    Relocation package

    Thinking Machines Lab

    San Francisco, CA
    23 hours ago
  • $163k - $203k

     ...contributor on the SRE team, responsible for the reliability, scalability, and security of Prosper's...  .... This is as much of a platform engineering role as it is SRE role - you will...  ...Own application-layer reliability within Kubernetes-based compute (managed by the Infrastructure... 
    Work experience placement
    Work at office
    Local area
    Remote work
    Flexible hours
    2 days per week

    Prosper.com

    San Francisco, CA
    4 days ago
  •  ...The role We're looking for a world-class Site Reliability Engineer to ensure the reliability, performance, and scalability of our AI infrastructure...  ..., SR-IOV, high-throughput networking) ~ Experience with Kubernetes or similar orchestrators ~ Familiarity with... 

    Blaxel, Inc

    San Francisco, CA
    2 days ago
  • $127k - $249k

     ...The Team Platform Engineering is the department within SRE that is responsible for a range...  ...these are our multi-cloud-provider Kubernetes infrastructure, networking, load balancing...  ...components that ensure cluster reliability and security (e.g., CoreDNS, cert-manager... 
    Work at office
    Local area
    Remote work
    Worldwide
    Flexible hours

    MongoDB

    San Francisco, CA
    23 hours ago
  •  ...Francisco, NYC, or London offices. About the Role As a Site Reliability Engineer (SRE) at Mercor, you'll own production reliability across...  ...scratch. Hands-on experience in the AWS ecosystem, Kubernetes, and modern IaC tooling (Terraform, Spacelift, etc.). Benefits... 
    Work at office
    Relocation package

    Mercor Alabaster

    San Francisco, CA
    4 days ago
  •  ...treatment. What We Look for in a Great Engineer You have the intensity and...  ...handled 500+ machine deployments . Kubernetes Mastery: Own our containerized infrastructure...  ...release while maintaining the highest reliability. DevX Support: Support Developer... 
    Work at office

    Latent

    San Francisco, CA
    1 day ago
  •  ...significantly outperforms individual engineers. We combine language models with human...  ...Role: We are seeking an experienced Site Reliability Engineer to join our Platform...  ...containerization and orchestration platforms (Kubernetes, Docker) Technical Skills... 

    CodeRabbit

    San Francisco, CA
    1 day ago
  • $150k - $250k

     ...Site Reliability Engineer role USC or GC only are considered at this time. San Francisco - Local to Bay area only but role...  ...customer satisfaction." Tech stack Python, C, Rust, Kubernetes, FastAPI, Redis, Postgres, Prisma Seniority 4... 
    Work experience placement
    Casual work
    Local area
    Immediate start
    Remote work

    3B Staffing LLC

    San Francisco, CA
    4 days ago
  • $181.69k - $213.75k

     ...Senior Site Reliability Engineer San Francisco, California; Santa Clara, California; Seattle, WA The Company You'll Join Carta connects...  .... Our stack is Python, Java, Terraform, gRPC, Docker, Kubernetes, Postgres, running on AWS. Come join us! Cloud... 
    Full time
    Work at office

    Carta

    San Francisco, CA
    4 days ago

Do you want to receive more vacancies?

Subscribe and receive similar vacancies to Staff Site Reliability Engineer - Kubernetes. Be the first to apply!