Principal Site Reliability Engineer
Zscaler
Principal Site Reliability Engineer
We are looking for a Principal Site Reliability Engineer to join our team. This role is available as a hybrid opportunity 3 days a week in San Jose, CA or as a remote position, reporting to Production Engineering in the Cloud Infrastructure & Operations department. Join Zscaler to be a force multiplier for the reliability of a global platform protecting over 15 million users.
In this role, you will provide the technical vision and hands-on execution to drive an "automation-first" culture across the company. By maturing our observability and architectural standards, you will directly reduce our Mean Time to Mitigate (MTTM) and shape the scalability of our globally distributed, multi-cloud infrastructure.
Role Expectations
- Design and implement highly available, scalable infrastructure across AWS, Azure, GCP, and bare-metal environments
- Drive an "automation-first" culture by writing code (Python/Go) to eliminate manual toil and build self-healing systems
- Implement and maintain sophisticated observability (Prometheus, Grafana, OpenTelemetry), define SLIs/SLOs, and establish error budgets
- Act as a lead Incident Commander (TDO on-call), develop response playbooks, and conduct deep-dive post-incident analyses
- Partner with Engineering and partner teams to conduct operability reviews
Success Profile
- You act like an owner with a bias for action and integrity.
- You are a pragmatic builder obsessed with creating, iterating, and shipping.
- You champion simplicity by distilling complex problems into clear, actionable plans.
- You are data-driven, valuing evidence over assumptions.
- You think at scale, building solutions and processes built to last a high-growth global organization.
Minimum Qualifications
- 10+ years of experience managing reliability, scalability, and availability for large-scale production services
- Foundational understanding of AI/ML technologies and experience leveraging, securing, or positioning AI-driven solutions to optimize outcomes within your functional domain
- Deep expertise in programming (e.g., Python, Go, or C/C++)
- Strong background in networking protocols, Linux/FreeBSD systems, and distributed architecture
- Leveraged ITIL frameworks and incident data during high-stakes incident management and 24/7 on-call rotations to drive service maturity through systematic problem management and technical operability reviews.
Preferred Qualifications
- Extensive experience with public cloud (AWS, Azure, GCP) and Infrastructure-as-Code (Ansible, Terraform)
- Experience with chaos engineering and disaster recovery planning at scale
- Expertise in global routing (BGP) and traffic tunneling (GRE, IPSec) with a deep understanding of L7 proxy architectures (HAProxy), DNS at scale, and OS networking stack internals
$185k - $278k
...transforming them into scalable solutions. * Debugging OS and engineering issues within our provided Linux environment. *... ...efficiently. The Impact You Will Have: * Enhancing the reliability and performance of our engineering environment. * Streamlining...PrincipalRemote work$147k - $237.5k
...builds and delivers the industry’s most advanced SecOps platform, consisting of XDR, XSIAM, XSOAR, and XPANSE. As a Principal Site Reliability Engineer within the Cortex DevOps team, you will serve as a technical leader responsible for driving the reliability,...PrincipalFull timeWork at office- Job Summary Note: This role requires US Citizenship. Your Career As a Principal Site Reliability Engineer, you will serve as the technical authority for our cloud-native infrastructure. You’re responsible for architecting the reliability, scalability, and security of a...PrincipalVisa sponsorshipWork visaShift work
$151.6k - $245.3k
Job Summary Palo Alto Networks runs a large hybrid infrastructure and is one of the largest GCP customers. As a Site Reliability Engineer, you will be part of a team supporting the services running on this infrastructure. This includes automation, architecture, performance...Principal$151.6k - $245.3k
...outcomes. About the Role Palo Alto Networks runs a large infrastructure and is one of the largest GCP customers. As a Principal Site Reliability Engineer for the ADEM (Autonomous Digital Experience Management) team, you will be part of a team supporting the services that...PrincipalFull timeWork at officeVisa sponsorshipWork visa$240k - $250k
...for our customers. As we scale globally, reliability, availability, and performance are not... ...—they are core product features. As a Principal Engineer, you will define and drive the... ...Development, Platform Engineering, or Site Reliability Engineering role, with a strong...PrincipalFull time- Palo Alto Networks, Inc. is seeking a Principal Site Reliability Engineer in Santa Clara, CA. This role involves supporting a large infrastructure and ensuring applications are production-ready, scalable, and reliable. You'll work closely with developers and researchers...Principal
$272k - $431.25k
...Overview We are hiring senior engineers to work on the CUDA driver, a core component of NVIDIA’s platform for accelerating general purpose computation on the GPU. What you’ll be doing: Use your design abilities, coding expertise, and creativity to deliver the...Principal$272k - $431.25k
## Principal Systems Software Engineer, LPUApplylocations: US, CA, Santa Claratime type: Full timeposted on: Posted Todayjob requisition id: JR202... ...the rest of the platform depends on; drive the hardest reliability and bring-up problems to root cause; and raise the throughput...PrincipalShift work- ...usual and that goes for the talent we hire. We’re looking for a Principal SRE to join our InfoSec SRE team that owns the process of... ...Qualifications Must be a U.S. Citizen. BS/MS in Computer Science/Engineering or equivalent training, education, and experience in information...PrincipalFull timeWork at officeVisa sponsorshipWork visa
$83k - $187k
...Senior Site Reliability Engineer OCI Incident Response is the first line of defense in maintaining the high availability of Oracle's cloud. We minimize customer-impacting events by making them shorter, less frequent, and less impactful through large-scale incident management...Temporary workWork experience placementFlexible hours$181.69k - $213.75k
...Senior Site Reliability Engineer San Francisco, California; Santa Clara, California; Seattle, WA The Company You'll Join Carta connects founders, investors, and limited partners through world-class software, purpose-built for everyone in venture capital, private...Full timeWork at office$128k - $216k
...consumers to one another millions of times a day - quickly, reliably, and securely. Any time you swipe your credit card, pay... ...scale, come make a difference at Fiserv. Job Title Sr. Site Reliability Engineer About Clover Clover is a pioneer in the fintech...Worldwide$126k - $204.5k
...As part of this role, you will collaborate closely with our engineering teams to develop innovative solutions that provide clear and... ...team to influence the operability of the product and ensure the reliability and availability of our services. Qualifications...Full timeWork at office- ...critical part of delivering the highest revenue licenses the company offers. Our core Application Identification and Content Inspection Engine runs on hardware, virtualized, container and cloud‑delivered firewalls. Any features that require inspection or extraction of layer...Principal
$170k - $200k
...Site Reliability Engineer We are seeking a talented and motivated Site Reliability Engineer to join our engineering team. You will be responsible for building, maintaining, and troubleshooting cloud service/cluster, infrastructure, and monitoring systems to ensure high...Full timeWorldwide- ...Title: Site Reliability Engineer (SRE) Location: Location: Sunnyvale, CA (3x/ week onsite) Contract Responsibilities:... ...types and exchange protocols). ~ Understanding of SRE principals including monitoring, alerting, error budgets, fault analysis...Contract work
$172k - $349k
## Principal Software Engineer, Systems/Solutions TestApplylocations: Sunnyvale, California, United States of Americatime type: Full timeposted... ...validation of real-world customer deployments and ensure reliability, scalability, resiliency, and performance across highly...PrincipalWork experience placementWork at office2 days per week- ...keep the world running. Location: 5 on-site days a week in Sunnyvale, CA Headquarters. Our Team's Vision: Our Engineering team is driven by a culture that thrives... ...basis, you will work on enhancing system reliability and scalability of Illumio SaaS products,...Work experience placementImmediate start
- ...Site Reliability Engineer, Enterprise Technology Services At Apple, groundbreaking ideas quickly transform into extraordinary products and services that delight millions worldwide. If you're passionate about engineering and operating robust, large-scale systems, imagine...WorldwideRelocation
- ...Site Reliability Engineer Foxconn Industrial Internet (Fii), is a world leading professional design and manufacturing service provider of communication network equipment, cloud service equipment, precision tools and industrial robots. FII provides customers with intelligent...Permanent employmentFull timeWork at officeLocal area
$145k - $175k
...Site Reliability Engineer (SRE) Bolt Graphics is a semiconductor startup based in Sunnyvale, CA building the fastest and most efficient graphics processors. We pride ourselves on our first principles approach to solving problems. We are energized by our mission to reduce...Work at officeImmediate startWork from home- Job Description : Need to have experience with ticket support, azure, Splunk, ServiceNow, and any Java experience is a plus. Ideally candidates that come from an Enterprise background Handling tickets for the Walmart environment. Splunk, Servicenow...
- ...Site Reliability Engineer Location – San Jose, CA What You'll Do - Responsibilities Engage in and improve the whole lifecycle of services—from inception and design, through automated deployment, operation and refinement. Work with all relative teams to make...
- ...Job Title : Site Reliability Engineer Location: San Jose, CA Duration: Contract Job Description: Extensive experience working with linux flavors like rhel/centos os, shells, filesystems and utilities Knowledge of distributed computing...Contract workImmediate start
$272k - $431.25k
...a lasting impact on the world. At NVIDIA, as a Principal Rack Scale Systems Infrastructure Engineer, you will build and guide the development of software... ...deployment and integration needs. Establish reliability, security, validation, and left‑shift strategies that...PrincipalShift work$272k - $425.5k
Principal Software Engineer – Large-Scale LLM Memory and Storage Systems page is loaded## Principal Software Engineer – Large-Scale LLM Memory and Storage Systemslocations: US, CA, Santa Clara: US, WA, Remote: US, MA, Remotetime type: Full timeposted on: Posted Todayjob...PrincipalLocal areaRemote work$272k - $431.25k
NVIDIA Corporation is seeking a Principal Software Engineer to join the CSP Engagements team in Santa Clara, CA. This role is pivotal for driving rack-scale system software and firmware architecture, ensuring seamless integration, operation, and monitoring of systems at...Principal$272k - $431.25k
We are seeking a Principal AI and ML Infra Software Engineer, GPU Clusters at NVIDIA to join our Hardware Infrastructure team. As an Engineer, you will have a pivotal role in enhancing efficiency for our researchers by implementing progressions throughout the entire stack...Principal$175k - $250k
...Staff Site Reliability Engineer Figure is an AI robotics company developing autonomous general-purpose humanoid robots. The goal of the company is to ship humanoid robots with human level intelligence. Its robots are engineered to perform a variety of tasks in the home...Full time
Do you want to receive more vacancies?
Subscribe and receive similar vacancies to Principal Site Reliability Engineer. Be the first to apply!
- chief design engineer San Jose, CA
- principal developer San Jose, CA
- engineering director San Jose, CA
- director of product engineering San Jose, CA
- senior chief engineer San Jose, CA
- chief engineer San Jose, CA
- data center chief engineer San Jose, CA
- senior civil engineer project manager San Jose, CA
- director systems engineering San Jose, CA
- hotel chief engineer San Jose, CA


