Manager, Site Reliability Engineering
$204k - $306kOkta
Secure Every Identity, from AI to Human Identity is the key to unlocking the potential of AI. Okta secures AI by building the trusted, neutral infrastructure that enables organizations to safely embrace this new era. This work requires a relentless drive to solve complex challenges with real-world stakes. We are looking for builders and owners who operate with speed and urgency and execute with excellence. This is an opportunity to do career-defining work. We're all in on this mission. If you are too, let's talk. Manager, Site Reliability Engineering San Francisco, California Secure Every Identity, from AI to Human Identity is the key to unlocking the potential of AI. Okta secures AI by building the trusted, neutral infrastructure that enables organizations to safely embrace this new era. This work requires a relentless drive to solve complex challenges with real-world stakes. We are looking for builders and owners who operate with speed and urgency and execute with excellence. This is an opportunity to do career-defining work. We're all in on this mission. If you are too, let's talk. **This position requires 2 days a week in our San Francisco Office. The IDaaS Site Reliability Engineering Group Okta authenticates, authorizes and provisions millions of users a day. The service is hosted on Amazon Web Services (AWS) across multiple availability zones and geographically separated regions. The service is designed for high throughput and 99.999 availability. We're looking for a technical leader to help us continue to scale the service with great people and reliable, cost-effective, and efficient infrastructure, processes, and tooling. As the Manager of Infrastructure Platform and Shared Services, you will oversee multiple teams focused on Edge networking, K8s platform, CI/CD, Observability, automation platform & tooling. What you’ll be doing Managing a team of SRE’s supporting various workloads and teams that support our IDaaS platform. Drive the microservice journey, DevOps maturity, and workload reliability in tandem with architects and teams across the organization. Accelerate the velocity of SRE and product engineering by developing powerful tooling, intuitive self-service capabilities, and robust self-healing patterns. Lead, mentor, and grow a high-performing team of engineers and managers across platform, infrastructure, and shared services domains. Perform engineering design evaluations and ensure the completion of projects within resource, budget, and scheduling constraints. Improve SDLC processes for Cloud infrastructure as a code, including the maturity of CI/CD pipelines, change and release management Manage service and business expectations and prioritize resource allocation Maintain a deep knowledge of industry best practices, evolving trends, and technologies What you’ll bring to the role 3+ years of experience in technical leadership & people management Extensive experience using Agile and DevOps methodologies to build product infrastructure and shared service at scale Experience running large-scale infrastructure platforms supporting a SaaS/Cloud service in a public Cloud, preferably AWS. Experience supporting a multi-Cloud environment will be a plus. Strong expertise in cloud-native architectures, containerization (Kubernetes), IaC (Terraform), and CI/CD pipelines Strong background and hands-on experience in SW development, PaaS and automation Deep experience with building and operating observability platforms and monitoring tools (Grafana, Splunk, APM etc.) in a large scale environment. Effective verbal, written communication and interpersonal skills Computer Science Degree or related degree or equivalent experience Additional requirements: This position requires the ability to access federal environments and/or have access to protected federal data. As a condition of employment for this position, the successful candidate must be able to submit documentation establishing U.S. Person status (e.g. a U.S. Citizen, National, Lawful Permanent Resident, Refugee, or Asylee. 22 CFR 120.15) upon hire. #LI-Hybrid
P24518_3462184
Below is the annual base salary range for candidates located in San Francisco Bay Area. Your actual base salary will depend on factors such as your skills, qualifications, experience, and work location. In addition, Okta offers equity (where applicable), bonus, and benefits, including health, dental and vision insurance, 401(k), flexible spending account, and paid leave (including PTO and parental leave) in accordance with our applicable plans and policies. To learn more about our Total Rewards program please visit: The annual base salary range for this position for candidates located in the San Francisco Bay area is between:$204,000—$306,000 USD
The Okta Experience Supporting Your Well-Being Driving Social Impact Developing Talent and Fostering Connection + Community We are intentional about connection. Our global community, spanning over 20 offices worldwide, is united by a drive to innovate. Your journey begins with an immersive, in-person onboarding experience designed to accelerate your impact and connect you to our mission and team from day one. Okta is an Equal Opportunity Employer. All qualified applicants will receive consideration for employment without regard to race, color, religion, sex, sexual orientation, gender identity, national origin, ancestry, marital status, age, physical or mental disability, or status as a protected veteran. We also consider for employment qualified applicants with arrest and convictions records, consistent with applicable laws. If reasonable accommodation is needed to complete any part of the job application, interview process, or onboarding please use this Form to request an accommodation. Notice for New York City Applicants & Employees: Okta may use Automated Employment Decision Tools (AEDT), as defined by New York City Local Law 144, that use artificial intelligence, machine learning, or other automated processes to assist in our recruitment and hiring process. In accordance with NYC Local Law 144, if you are an applicant or employee residing in New York City, please click here to view our full NYC AEDT Notice.- ...About the role We’re hiring an SRE to join our engineering team at Plenful and take ownership of the reliability and performance of the systems that power our product... ...patching, audit readiness and vulnerability management. Participate in the on‑call rotation and...SuggestedWork at officeRemote workFlexible hours2 days per week
- ...that significantly outperforms individual engineers. We combine language models with human... ...: We are seeking an experienced Site Reliability Engineer to join our Platform Engineering... ...roles ~ Proven track record of managing production systems at scale, preferably...Suggested
$150k - $250k
...Site Reliability Engineer role USC or GC only are considered at this time. San Francisco - Local to Bay area only but role is remote and occasion meeting required Latest update, 03/31/2026: The Site Reliability Engineer role is critical for...SuggestedWork experience placementCasual workLocal areaImmediate startRemote work- ...roles. What You'll Do Architect, design, implement, and manage robust, scalable, and secure infrastructure solutions.... ...to ensure we achieve the right balance of developer velocity, reliability and performance, and cost efficiency. What You'll Bring...Suggested
$150k
About The Role We are seeking an experienced Site Reliability Engineer (SRE) with a strong focus on DevSecOps to join our growing engineering... ..., APIs, and software supply chain. You will drive patch management programs, harden our Cloud infrastructure, and maintain our...Suggested- ...Udaip Cloud-Based Data And Ai Platform Engineer At U.S. Bank, we're on a journey to do our best. Helping the customers and businesses... ...issues with performance, networking, kernel drivers, package management, etc. or have a good idea where to start You have used at...Temporary workWork experience placement
- ...About the job Senior Site Reliability Engineer About the Company Stellar is a decentralized, public blockchain that gives developers the... ...DevOps engineer. ~ First-hand experience with configuration management and infrastructure as code (Ansible, Puppet, Terraform)....
$170k - $250k
...Site Reliability Engineer (SRE) Location: San Francisco, CA / Palo Alto, CA Company Stage of Funding: Growth-Stage AI Infrastructure Company... ...production tooling in Python or Go for infrastructure management, health checks, reconciliation, and capacity optimization....Work at officeVisa sponsorshipFlexible hours- ...redefine computing. About the Role We're seeking a Site Reliability Engineer to ensure Hyperbolic's GPU marketplace and AI... ...success rates, building robust incident response systems, managing capacity across our distributed GPU network, and implementing...
$106k - $130k
...generation of application infrastructure and to be responsible for reliability, automation and scalability using and the latest best... ...standards. Build automation and tooling around application management, such as deployments, configuration changes and disaster recovery...Hourly payWork experience placementWork at officeImmediate startVisa sponsorshipWork visaFlexible hours$170k - $230k
...Site Reliability Engineer (SRE) Palo Alto / San Francisco Bay Area About Mithril Mithril is an AI infrastructure platform built to make... ...at this stage are primarily reactive — on-call, incident management, keeping services stable. At Mithril, you'll also be...Work at officeLocal area1 day per week- ...Site Reliability Engineer We are looking for a dynamic engineer to join our rapidly growing SRE team. As an SRE, you will report to our VP... ..., etc. Working knowledge of a well known configuration management system such as Ansible Experience operating network gear...Relocation package
$98.58k - $138.02k
...Site Reliability Engineer II Restaurant365 is a SaaS company disrupting the restaurant industry! Our cloud-based platform provides a unique... ...Terraform, Ansible, or CloudFormation. Work within change management protocols to provide maximum uptime for production systems...Work at office- ...we continue to scale. About the role We're hiring a Site Reliability Engineer (SRE) to ensure the reliability, performance, and... ...error budgets across critical systems Incidents are well-managed, rare, and decrease over time Engineers have high-confidence...Work at officeRemote workFlexible hours2 days per week
- ...Site Reliability Engineer (SRE) FLUIX is building the AI operating system that plans, designs, and optimizes AI infrastructure. We are based... ...implementing improvements to prevent recurrence. Create and manage multiple cloud instances (dev, staging, test), optimize...Work at officeWeekend work
$200k - $300k
...Site Reliability Engineer Title of Role: Site Reliability Engineer Location: San Francisco, onsite Company Stage of Funding: Venture Round... ...scalable applications in a fast-paced environment. Manage and optimize Kubernetes clusters for high availability and...Work at office$160k - $250k
...machine learning models, we also need to grow our DevOps and Site Reliability team to maintain the reliability of our enterprise SaaS offering... ...workflows of developer, data, and machine learning teams Manage secure integration and deployment tooling Create, maintain...- ...treatment. What We Look for in a Great Engineer You have the intensity and... ...deep expertise in Kubernetes and Helm to manage deployment, scaling, and operational health... ...feature release while maintaining the highest reliability. DevX Support: Support Developer...Work at office
$205k - $305k
...changing Stellar ecosystem. SDF is looking for a Director of Site Reliability Engineering to lead a small, high-leverage SRE team and help shape... ...and compute patterns, CI/CD, observability, secrets management, GitHub workflows, and infrastructure automation. Help...Temporary workWork at officeLocal areaRemote workWorldwideFlexible hours- ...Staff Site Reliability Engineer (SRE) Location: San Francisco, CA Job Responsibilities As our Staff SRE, you'll be the primary expert responsible for our entire compute ecosystem. Your key responsibilities will include: As a Staff SRE, you'll operate at the...
$300k
..., full-scale model training, or inference. As a Platform Engineer/Senior Site Reliability Engineer, you’ll own the reliability, performance, and automation... ..., ensuring seamless orchestration across environments managed by Slurm, Kubernetes, or direct SSH access. As well as...$50 per hour
...carbon-negative distributed computing solutions. Crusoe Cloud is a managed cloud services platform powered by stranded energy that... ...contributing to architecture and design (architecture, design patterns, reliability and scaling) of new and current systems Bachelor's Degree in...Temporary workWork experience placement- US Corp. is seeking a Lead Site Reliability Engineer to spearhead our mission of delivering highly available and performant systems. With an average... ...implementing automated infrastructure using Terraform, managing containerized workloads within Kubernetes, and refining...
- ...Responsibilities Lead and onboard services and teams to the reliability tenets. Establish and maintain Service Level... ...Computer Science or equivalent. 6+ years of experience in Site Reliability Engineering, managing infrastructure and services at scale. History of end‑to...
$125k - $165k
Position: Site Reliability Engineer Location: San Francisco, CA Job Id: 434 # of Openings: 1 TELCOR Inc, a leading innovator in laboratory software... ...across cloud and containerized environments, as well as manage production infrastructure and deployment workflows across...Temporary workWork at officeVisa sponsorshipWork visaRelocation packageFlexible hours- The role We're looking for a world-class Site Reliability Engineer to ensure the reliability, performance, and scalability of our AI infrastructure... ...Jenkins) Strong debugging, problem‑solving, and incident‑management skills Preferred Experience with infrastructure‑as‑code...
$163k - $203k
...contributor on the SRE team, responsible for the reliability, scalability, and security of Prosper’s... .... This is as much of a platform engineering role as it is SRE role — you will... ...reliability within Kubernetes‑based compute (managed by the Infrastructure Engineering team)...Work experience placementWork at officeLocal areaRemote workFlexible hours2 days per week- ...daily users while enabling our engineering teams to ship fast. You'll... ...automation and tooling that improves reliability and partnering with... ...reliability best practices Manage and optimize our infrastructure... ...What you'll bring 5+ years in Site Reliability Engineering, DevOps...Work at officeWork from home
- ...specialist technology provider delivering advanced provisioning, management, and security solutions for data centers. The organization... ...Skills/Qualifications BS/MS degree in Computer Science, Engineering, or a related subject. Equivalent experience accepted. Proven...Work experience placementStart working todayRemote workFlexible hours
- A dynamic tech firm located in San Francisco is seeking a Site Reliability Engineer to enhance operational health across their production systems... ...expertise in AWS and strong programming skills. You will manage production systems' reliability and lead incident response...
Do you want to receive more vacancies?
Subscribe and receive similar vacancies to Manager, Site Reliability Engineering. Be the first to apply!
- site reliability engineer remote San Francisco, CA
- site reliability engineer sre San Francisco, CA
- site reliability engineer San Francisco, CA
- on-site clinical research associate (traveling/remote) San Francisco, CA
- junior website developer San Francisco, CA
- site merchandiser San Francisco, CA
- IT site lead San Francisco, CA
- site leader San Francisco, CA
- site safety San Francisco, CA
- site recruiter San Francisco, CA


