Principal Site Reliability Engineer

Zscaler

Principal Site Reliability Engineer

We are looking for a Principal Site Reliability Engineer to join our team. This role is available as a hybrid opportunity 3 days a week in San Jose, CA or as a remote position, reporting to Production Engineering in the Cloud Infrastructure & Operations department. Join Zscaler to be a force multiplier for the reliability of a global platform protecting over 15 million users.

In this role, you will provide the technical vision and hands-on execution to drive an "automation-first" culture across the company. By maturing our observability and architectural standards, you will directly reduce our Mean Time to Mitigate (MTTM) and shape the scalability of our globally distributed, multi-cloud infrastructure.

Role Expectations

Design and implement highly available, scalable infrastructure across AWS, Azure, GCP, and bare-metal environments
Drive an "automation-first" culture by writing code (Python/Go) to eliminate manual toil and build self-healing systems
Implement and maintain sophisticated observability (Prometheus, Grafana, OpenTelemetry), define SLIs/SLOs, and establish error budgets
Act as a lead Incident Commander (TDO on-call), develop response playbooks, and conduct deep-dive post-incident analyses
Partner with Engineering and partner teams to conduct operability reviews

Success Profile

You act like an owner with a bias for action and integrity.
You are a pragmatic builder obsessed with creating, iterating, and shipping.
You champion simplicity by distilling complex problems into clear, actionable plans.
You are data-driven, valuing evidence over assumptions.
You think at scale, building solutions and processes built to last a high-growth global organization.

Minimum Qualifications

10+ years of experience managing reliability, scalability, and availability for large-scale production services
Foundational understanding of AI/ML technologies and experience leveraging, securing, or positioning AI-driven solutions to optimize outcomes within your functional domain
Deep expertise in programming (e.g., Python, Go, or C/C++)
Strong background in networking protocols, Linux/FreeBSD systems, and distributed architecture
Leveraged ITIL frameworks and incident data during high-stakes incident management and 24/7 on-call rotations to drive service maturity through systematic problem management and technical operability reviews.

Preferred Qualifications

Extensive experience with public cloud (AWS, Azure, GCP) and Infrastructure-as-Code (Ansible, Terraform)
Experience with chaos engineering and disaster recovery planning at scale
Expertise in global routing (BGP) and traffic tunneling (GRE, IPSec) with a deep understanding of L7 proxy architectures (HAProxy), DNS at scale, and OS networking stack internals

Apply

Vacancy posted 2 days ago

Similar jobs that could be interesting for youBased on the Principal Site Reliability Engineer in San Jose, CA vacancy

Principal Site Reliability Engineer
$185k - $278k
...transforming them into scalable solutions. * Debugging OS and engineering issues within our provided Linux environment. *... ...efficiently. The Impact You Will Have: * Enhancing the reliability and performance of our engineering environment. * Streamlining...
Principal
Remote work
Synopsys
Sunnyvale, CA
3 days ago
Principal Site Reliability Engineer
$147k - $237.5k
...builds and delivers the industry’s most advanced SecOps platform, consisting of XDR, XSIAM, XSOAR, and XPANSE. As a Principal Site Reliability Engineer within the Cortex DevOps team, you will serve as a technical leader responsible for driving the reliability,...
Principal
Full time
Work at office
Palo Alto Networks
Santa Clara, CA
2 days ago
Principal Site Reliability Engineer (CIPE)
Job Summary Note: This role requires US Citizenship. Your Career As a Principal Site Reliability Engineer, you will serve as the technical authority for our cloud-native infrastructure. You’re responsible for architecting the reliability, scalability, and security of a...
Principal
Visa sponsorship
Work visa
Shift work
Palo Alto Networks, Inc.
Santa Clara, CA
3 days ago
Principal Site Reliability Engineer (AIOps)
$151.6k - $245.3k
Job Summary Palo Alto Networks runs a large hybrid infrastructure and is one of the largest GCP customers. As a Site Reliability Engineer, you will be part of a team supporting the services running on this infrastructure. This includes automation, architecture, performance...
Principal
Palo Alto Networks, Inc.
Santa Clara, CA
1 day ago
Principal Site Reliability Engineer ( U.S Citizenship required )
$151.6k - $245.3k
...outcomes. About the Role Palo Alto Networks runs a large infrastructure and is one of the largest GCP customers. As a Principal Site Reliability Engineer for the ADEM (Autonomous Digital Experience Management) team, you will be part of a team supporting the services that...
Principal
Full time
Work at office
Visa sponsorship
Work visa
Palo Alto Networks, Inc.
Santa Clara, CA
4 days ago
Principal Site Reliability Engineer, Google Cloud
$240k - $250k
...for our customers. As we scale globally, reliability, availability, and performance are not... ...—they are core product features. As a Principal Engineer, you will define and drive the... ...Development, Platform Engineering, or Site Reliability Engineering role, with a strong...
Principal
Full time
Saviynt
Milpitas, CA
2 days ago
Principal Cloud SRE & Automation Engineer
Palo Alto Networks, Inc. is seeking a Principal Site Reliability Engineer in Santa Clara, CA. This role involves supporting a large infrastructure and ensuring applications are production-ready, scalable, and reliable. You'll work closely with developers and researchers...
Principal
Palo Alto Networks, Inc.
Santa Clara, CA
3 days ago
Principal System Software Engineer - CUDA Driver
$272k - $431.25k
...Overview We are hiring senior engineers to work on the CUDA driver, a core component of NVIDIA’s platform for accelerating general purpose computation on the GPU. What you’ll be doing: Use your design abilities, coding expertise, and creativity to deliver the...
Principal
NVIDIA Corporation
Santa Clara, CA
1 day ago
Principal Systems Software Engineer, LPU
$272k - $431.25k
## Principal Systems Software Engineer, LPUApplylocations: US, CA, Santa Claratime type: Full timeposted on: Posted Todayjob requisition id: JR202... ...the rest of the platform depends on; drive the hardest reliability and bring-up problems to root cause; and raise the throughput...
Principal
Shift work
NVIDIA Corporation
Santa Clara, CA
1 day ago
Principal SRE Engineer (US Citizen)
...usual and that goes for the talent we hire. We’re looking for a Principal SRE to join our InfoSec SRE team that owns the process of... ...Qualifications Must be a U.S. Citizen. BS/MS in Computer Science/Engineering or equivalent training, education, and experience in information...
Principal
Full time
Work at office
Visa sponsorship
Work visa
Palo Alto Networks, Inc.
Santa Clara, CA
1 day ago
Senior Site Reliability Engineer
$83k - $187k
...Senior Site Reliability Engineer OCI Incident Response is the first line of defense in maintaining the high availability of Oracle's cloud. We minimize customer-impacting events by making them shorter, less frequent, and less impactful through large-scale incident management...
Temporary work
Work experience placement
Flexible hours
Oracle
Santa Clara, CA
22 hours ago
Senior Site Reliability Engineer
$181.69k - $213.75k
...Senior Site Reliability Engineer San Francisco, California; Santa Clara, California; Seattle, WA The Company You'll Join Carta connects founders, investors, and limited partners through world-class software, purpose-built for everyone in venture capital, private...
Full time
Work at office
Carta
Santa Clara, CA
3 days ago
Sr. Site Reliability Engineer
$128k - $216k
...consumers to one another millions of times a day - quickly, reliably, and securely. Any time you swipe your credit card, pay... ...scale, come make a difference at Fiserv. Job Title Sr. Site Reliability Engineer About Clover Clover is a pioneer in the fintech...
Worldwide
Fiserv
Sunnyvale, CA
2 days ago
Senior Staff Site Reliability Engineer
$126k - $204.5k
...As part of this role, you will collaborate closely with our engineering teams to develop innovative solutions that provide clear and... ...team to influence the operability of the product and ensure the reliability and availability of our services. Qualifications...
Full time
Work at office
Palo Alto Networks
Santa Clara, CA
2 days ago
Sr Principal Software Engineer (L7 Security)
...critical part of delivering the highest revenue licenses the company offers. Our core Application Identification and Content Inspection Engine runs on hardware, virtualized, container and cloud‑delivered firewalls. Any features that require inspection or extraction of layer...
Principal
Palo Alto Networks, Inc.
Santa Clara, CA
3 days ago
Site Reliability Engineer
$170k - $200k
...Site Reliability Engineer We are seeking a talented and motivated Site Reliability Engineer to join our engineering team. You will be responsible for building, maintaining, and troubleshooting cloud service/cluster, infrastructure, and monitoring systems to ensure high...
Full time
Worldwide
Edelman
Sunnyvale, CA
2 days ago
Site Reliability Engineer (SRE)
...Title: Site Reliability Engineer (SRE) Location: Location: Sunnyvale, CA (3x/ week onsite) Contract Responsibilities:... ...types and exchange protocols). ~ Understanding of SRE principals including monitoring, alerting, error budgets, fault analysis...
Contract work
AceStack LLC
Sunnyvale, CA
22 hours ago
Principal Software Engineer, Systems/Solutions Test
$172k - $349k
## Principal Software Engineer, Systems/Solutions TestApplylocations: Sunnyvale, California, United States of Americatime type: Full timeposted... ...validation of real-world customer deployments and ensure reliability, scalability, resiliency, and performance across highly...
Principal
Work experience placement
Work at office
2 days per week
Hewlett Packard Enterprise Development LP
Sunnyvale, CA
3 days ago
Site Reliability Engineer II
...keep the world running. Location: 5 on-site days a week in Sunnyvale, CA Headquarters. Our Team's Vision: Our Engineering team is driven by a culture that thrives... ...basis, you will work on enhancing system reliability and scalability of Illumio SaaS products,...
Work experience placement
Immediate start
Illumio
Sunnyvale, CA
5 days ago
Site Reliability Engineer, Enterprise Technology Services
...Site Reliability Engineer, Enterprise Technology Services At Apple, groundbreaking ideas quickly transform into extraordinary products and services that delight millions worldwide. If you're passionate about engineering and operating robust, large-scale systems, imagine...
Worldwide
Relocation
Apple
Sunnyvale, CA
22 hours ago
Site Reliability Engineer
...Site Reliability Engineer Foxconn Industrial Internet (Fii), is a world leading professional design and manufacturing service provider of communication network equipment, cloud service equipment, precision tools and industrial robots. FII provides customers with intelligent...
Permanent employment
Full time
Work at office
Local area
Foxconn Industrial Internet
San Jose, CA
5 days ago
Site Reliability Engineer
$145k - $175k
...Site Reliability Engineer (SRE) Bolt Graphics is a semiconductor startup based in Sunnyvale, CA building the fastest and most efficient graphics processors. We pride ourselves on our first principles approach to solving problems. We are energized by our mission to reduce...
Work at office
Immediate start
Work from home
Bolt Graphics
Sunnyvale, CA
23 hours ago
Site Reliability Engineer
Job Description : Need to have experience with ticket support, azure, Splunk, ServiceNow, and any Java experience is a plus. Ideally candidates that come from an Enterprise background Handling tickets for the Walmart environment. Splunk, Servicenow...
3B Staffing LLC
Sunnyvale, CA
3 days ago
Site Reliability Engineer
...Site Reliability Engineer Location – San Jose, CA What You'll Do - Responsibilities Engage in and improve the whole lifecycle of services—from inception and design, through automated deployment, operation and refinement. Work with all relative teams to make...
Netpace
San Jose, CA
3 days ago
Site Reliability Engineer
...Job Title : Site Reliability Engineer Location: San Jose, CA Duration: Contract Job Description: Extensive experience working with linux flavors like rhel/centos os, shells, filesystems and utilities Knowledge of distributed computing...
Contract work
Immediate start
Syntricate Technologies
San Jose, CA
2 days ago
Principal Software Engineer - Rack Scale Systems Infrastructure
$272k - $431.25k
...a lasting impact on the world. At NVIDIA, as a Principal Rack Scale Systems Infrastructure Engineer, you will build and guide the development of software... ...deployment and integration needs. Establish reliability, security, validation, and left‑shift strategies that...
Principal
Shift work
NVIDIA Corporation
Santa Clara, CA
3 days ago
Principal Software Engineer - Large-Scale LLM Memory and Storage Systems
$272k - $425.5k
Principal Software Engineer – Large-Scale LLM Memory and Storage Systems page is loaded## Principal Software Engineer – Large-Scale LLM Memory and Storage Systemslocations: US, CA, Santa Clara: US, WA, Remote: US, MA, Remotetime type: Full timeposted on: Posted Todayjob...
Principal
Local area
Remote work
NVIDIA Corporation
Santa Clara, CA
2 days ago
Principal Rack-Scale System Software Engineer
$272k - $431.25k
NVIDIA Corporation is seeking a Principal Software Engineer to join the CSP Engagements team in Santa Clara, CA. This role is pivotal for driving rack-scale system software and firmware architecture, ensuring seamless integration, operation, and monitoring of systems at...
Principal
NVIDIA
Santa Clara, CA
5 days ago
Principal AI and ML Infra Software Engineer, GPU Clusters
$272k - $431.25k
We are seeking a Principal AI and ML Infra Software Engineer, GPU Clusters at NVIDIA to join our Hardware Infrastructure team. As an Engineer, you will have a pivotal role in enhancing efficiency for our researchers by implementing progressions throughout the entire stack...
Principal
NVIDIA Corporation
Santa Clara, CA
3 days ago
Staff Site Reliability Engineer
$175k - $250k
...Staff Site Reliability Engineer Figure is an AI robotics company developing autonomous general-purpose humanoid robots. The goal of the company is to ship humanoid robots with human level intelligence. Its robots are engineered to perform a variety of tasks in the home...
Full time
Figure
San Jose, CA
3 days ago

Do you want to receive more vacancies?

Subscribe and receive similar vacancies to Principal Site Reliability Engineer. Be the first to apply!