Sign up to access all features of our service.
  • Job search
  • Favorites
  • Create a CV
    New
  • Salaries
  • Subscriptions

Principal Site Reliability Engineer

Zscaler

Principal Site Reliability Engineer

We are looking for a Principal Site Reliability Engineer to join our team. This role is available as a hybrid opportunity 3 days a week in San Jose, CA or as a remote position, reporting to Production Engineering in the Cloud Infrastructure & Operations department. Join Zscaler to be a force multiplier for the reliability of a global platform protecting over 15 million users.

In this role, you will provide the technical vision and hands-on execution to drive an "automation-first" culture across the company. By maturing our observability and architectural standards, you will directly reduce our Mean Time to Mitigate (MTTM) and shape the scalability of our globally distributed, multi-cloud infrastructure.

Role Expectations

  • Design and implement highly available, scalable infrastructure across AWS, Azure, GCP, and bare-metal environments
  • Drive an "automation-first" culture by writing code (Python/Go) to eliminate manual toil and build self-healing systems
  • Implement and maintain sophisticated observability (Prometheus, Grafana, OpenTelemetry), define SLIs/SLOs, and establish error budgets
  • Act as a lead Incident Commander (TDO on-call), develop response playbooks, and conduct deep-dive post-incident analyses
  • Partner with Engineering and partner teams to conduct operability reviews

Success Profile

  • You act like an owner with a bias for action and integrity.
  • You are a pragmatic builder obsessed with creating, iterating, and shipping.
  • You champion simplicity by distilling complex problems into clear, actionable plans.
  • You are data-driven, valuing evidence over assumptions.
  • You think at scale, building solutions and processes built to last a high-growth global organization.

Minimum Qualifications

  • 10+ years of experience managing reliability, scalability, and availability for large-scale production services
  • Foundational understanding of AI/ML technologies and experience leveraging, securing, or positioning AI-driven solutions to optimize outcomes within your functional domain
  • Deep expertise in programming (e.g., Python, Go, or C/C++)
  • Strong background in networking protocols, Linux/FreeBSD systems, and distributed architecture
  • Leveraged ITIL frameworks and incident data during high-stakes incident management and 24/7 on-call rotations to drive service maturity through systematic problem management and technical operability reviews.

Preferred Qualifications

  • Extensive experience with public cloud (AWS, Azure, GCP) and Infrastructure-as-Code (Ansible, Terraform)
  • Experience with chaos engineering and disaster recovery planning at scale
  • Expertise in global routing (BGP) and traffic tunneling (GRE, IPSec) with a deep understanding of L7 proxy architectures (HAProxy), DNS at scale, and OS networking stack internals
Vacancy posted 2 days ago
Similar jobs that could be interesting for youBased on the Principal Site Reliability Engineer in San Jose, CA vacancy
  • $185k - $278k

     ...transforming them into scalable solutions. * Debugging OS and engineering issues within our provided Linux environment. *...  ...efficiently. The Impact You Will Have: * Enhancing the reliability and performance of our engineering environment. * Streamlining... 
    Principal
    Remote work

    Synopsys

    Sunnyvale, CA
    3 days ago
  • $147k - $237.5k

     ...builds and delivers the industry’s most advanced SecOps platform, consisting of XDR, XSIAM, XSOAR, and XPANSE. As a Principal Site Reliability Engineer within the Cortex DevOps team, you will serve as a technical leader responsible for driving the reliability,... 
    Principal
    Full time
    Work at office

    Palo Alto Networks

    Santa Clara, CA
    2 days ago
  • Job Summary Note: This role requires US Citizenship. Your Career As a Principal Site Reliability Engineer, you will serve as the technical authority for our cloud-native infrastructure. You’re responsible for architecting the reliability, scalability, and security of a... 
    Principal
    Visa sponsorship
    Work visa
    Shift work

    Palo Alto Networks, Inc.

    Santa Clara, CA
    3 days ago
  • $151.6k - $245.3k

    Job Summary Palo Alto Networks runs a large hybrid infrastructure and is one of the largest GCP customers. As a Site Reliability Engineer, you will be part of a team supporting the services running on this infrastructure. This includes automation, architecture, performance... 
    Principal

    Palo Alto Networks, Inc.

    Santa Clara, CA
    1 day ago
  • $151.6k - $245.3k

     ...outcomes. About the Role Palo Alto Networks runs a large infrastructure and is one of the largest GCP customers. As a Principal Site Reliability Engineer for the ADEM (Autonomous Digital Experience Management) team, you will be part of a team supporting the services that... 
    Principal
    Full time
    Work at office
    Visa sponsorship
    Work visa

    Palo Alto Networks, Inc.

    Santa Clara, CA
    4 days ago
  • $240k - $250k

     ...for our customers. As we scale globally, reliability, availability, and performance are not...  ...—they are core product features. As a Principal Engineer, you will define and drive the...  ...Development, Platform Engineering, or Site Reliability Engineering role, with a strong... 
    Principal
    Full time

    Saviynt

    Milpitas, CA
    2 days ago
  • Palo Alto Networks, Inc. is seeking a Principal Site Reliability Engineer in Santa Clara, CA. This role involves supporting a large infrastructure and ensuring applications are production-ready, scalable, and reliable. You'll work closely with developers and researchers... 
    Principal

    Palo Alto Networks, Inc.

    Santa Clara, CA
    3 days ago
  • $272k - $431.25k

     ...Overview We are hiring senior engineers to work on the CUDA driver, a core component of NVIDIA’s platform for accelerating general purpose computation on the GPU. What you’ll be doing: Use your design abilities, coding expertise, and creativity to deliver the... 
    Principal

    NVIDIA Corporation

    Santa Clara, CA
    1 day ago
  • $272k - $431.25k

    ## Principal Systems Software Engineer, LPUApplylocations: US, CA, Santa Claratime type: Full timeposted on: Posted Todayjob requisition id: JR202...  ...the rest of the platform depends on; drive the hardest reliability and bring-up problems to root cause; and raise the throughput... 
    Principal
    Shift work

    NVIDIA Corporation

    Santa Clara, CA
    1 day ago
  •  ...usual and that goes for the talent we hire. We’re looking for a Principal SRE to join our InfoSec SRE team that owns the process of...  ...Qualifications Must be a U.S. Citizen. BS/MS in Computer Science/Engineering or equivalent training, education, and experience in information... 
    Principal
    Full time
    Work at office
    Visa sponsorship
    Work visa

    Palo Alto Networks, Inc.

    Santa Clara, CA
    1 day ago
  • $83k - $187k

     ...Senior Site Reliability Engineer OCI Incident Response is the first line of defense in maintaining the high availability of Oracle's cloud. We minimize customer-impacting events by making them shorter, less frequent, and less impactful through large-scale incident management... 
    Temporary work
    Work experience placement
    Flexible hours

    Oracle

    Santa Clara, CA
    22 hours ago
  • $181.69k - $213.75k

     ...Senior Site Reliability Engineer San Francisco, California; Santa Clara, California; Seattle, WA The Company You'll Join Carta connects founders, investors, and limited partners through world-class software, purpose-built for everyone in venture capital, private... 
    Full time
    Work at office

    Carta

    Santa Clara, CA
    3 days ago
  • $128k - $216k

     ...consumers to one another millions of times a day - quickly, reliably, and securely. Any time you swipe your credit card, pay...  ...scale, come make a difference at Fiserv. Job Title Sr. Site Reliability Engineer About Clover Clover is a pioneer in the fintech... 
    Worldwide

    Fiserv

    Sunnyvale, CA
    2 days ago
  • $126k - $204.5k

     ...As part of this role, you will collaborate closely with our engineering teams to develop innovative solutions that provide clear and...  ...team to influence the operability of the product and ensure the reliability and availability of our services. Qualifications... 
    Full time
    Work at office

    Palo Alto Networks

    Santa Clara, CA
    2 days ago
  •  ...critical part of delivering the highest revenue licenses the company offers. Our core Application Identification and Content Inspection Engine runs on hardware, virtualized, container and cloud‑delivered firewalls. Any features that require inspection or extraction of layer... 
    Principal

    Palo Alto Networks, Inc.

    Santa Clara, CA
    3 days ago
  • $170k - $200k

     ...Site Reliability Engineer We are seeking a talented and motivated Site Reliability Engineer to join our engineering team. You will be responsible for building, maintaining, and troubleshooting cloud service/cluster, infrastructure, and monitoring systems to ensure high... 
    Full time
    Worldwide

    Edelman

    Sunnyvale, CA
    2 days ago
  •  ...Title: Site Reliability Engineer (SRE) Location: Location: Sunnyvale, CA (3x/ week onsite) Contract Responsibilities:...  ...types and exchange protocols). ~ Understanding of SRE principals including monitoring, alerting, error budgets, fault analysis... 
    Contract work

    AceStack LLC

    Sunnyvale, CA
    22 hours ago
  • $172k - $349k

    ## Principal Software Engineer, Systems/Solutions TestApplylocations: Sunnyvale, California, United States of Americatime type: Full timeposted...  ...validation of real-world customer deployments and ensure reliability, scalability, resiliency, and performance across highly... 
    Principal
    Work experience placement
    Work at office
    2 days per week

    Hewlett Packard Enterprise Development LP

    Sunnyvale, CA
    3 days ago
  •  ...keep the world running. Location: 5 on-site days a week in Sunnyvale, CA Headquarters. Our Team's Vision: Our Engineering team is driven by a culture that thrives...  ...basis, you will work on enhancing system reliability and scalability of Illumio SaaS products,... 
    Work experience placement
    Immediate start

    Illumio

    Sunnyvale, CA
    5 days ago
  •  ...Site Reliability Engineer, Enterprise Technology Services At Apple, groundbreaking ideas quickly transform into extraordinary products and services that delight millions worldwide. If you're passionate about engineering and operating robust, large-scale systems, imagine... 
    Worldwide
    Relocation

    Apple

    Sunnyvale, CA
    22 hours ago
  •  ...Site Reliability Engineer Foxconn Industrial Internet (Fii), is a world leading professional design and manufacturing service provider of communication network equipment, cloud service equipment, precision tools and industrial robots. FII provides customers with intelligent... 
    Permanent employment
    Full time
    Work at office
    Local area

    Foxconn Industrial Internet

    San Jose, CA
    5 days ago
  • $145k - $175k

     ...Site Reliability Engineer (SRE) Bolt Graphics is a semiconductor startup based in Sunnyvale, CA building the fastest and most efficient graphics processors. We pride ourselves on our first principles approach to solving problems. We are energized by our mission to reduce... 
    Work at office
    Immediate start
    Work from home

    Bolt Graphics

    Sunnyvale, CA
    23 hours ago
  • Job Description : Need to have experience with ticket support, azure, Splunk, ServiceNow, and any Java experience is a plus. Ideally candidates that come from an Enterprise background Handling tickets for the Walmart environment. Splunk, Servicenow...

    3B Staffing LLC

    Sunnyvale, CA
    3 days ago
  •  ...Site Reliability Engineer Location – San Jose, CA What You'll Do - Responsibilities Engage in and improve the whole lifecycle of services—from inception and design, through automated deployment, operation and refinement. Work with all relative teams to make... 

    Netpace

    San Jose, CA
    3 days ago
  •  ...Job Title : Site Reliability Engineer Location: San Jose, CA Duration: Contract Job Description: Extensive experience working with linux flavors like rhel/centos os, shells, filesystems and utilities Knowledge of distributed computing... 
    Contract work
    Immediate start

    Syntricate Technologies

    San Jose, CA
    2 days ago
  • $272k - $431.25k

     ...a lasting impact on the world. At NVIDIA, as a Principal Rack Scale Systems Infrastructure Engineer, you will build and guide the development of software...  ...deployment and integration needs. Establish reliability, security, validation, and left‑shift strategies that... 
    Principal
    Shift work

    NVIDIA Corporation

    Santa Clara, CA
    3 days ago
  • $272k - $425.5k

    Principal Software Engineer – Large-Scale LLM Memory and Storage Systems page is loaded## Principal Software Engineer – Large-Scale LLM Memory and Storage Systemslocations: US, CA, Santa Clara: US, WA, Remote: US, MA, Remotetime type: Full timeposted on: Posted Todayjob... 
    Principal
    Local area
    Remote work

    NVIDIA Corporation

    Santa Clara, CA
    2 days ago
  • $272k - $431.25k

    NVIDIA Corporation is seeking a Principal Software Engineer to join the CSP Engagements team in Santa Clara, CA. This role is pivotal for driving rack-scale system software and firmware architecture, ensuring seamless integration, operation, and monitoring of systems at... 
    Principal

    NVIDIA

    Santa Clara, CA
    5 days ago
  • $272k - $431.25k

    We are seeking a Principal AI and ML Infra Software Engineer, GPU Clusters at NVIDIA to join our Hardware Infrastructure team. As an Engineer, you will have a pivotal role in enhancing efficiency for our researchers by implementing progressions throughout the entire stack... 
    Principal

    NVIDIA Corporation

    Santa Clara, CA
    3 days ago
  • $175k - $250k

     ...Staff Site Reliability Engineer Figure is an AI robotics company developing autonomous general-purpose humanoid robots. The goal of the company is to ship humanoid robots with human level intelligence. Its robots are engineered to perform a variety of tasks in the home... 
    Full time

    Figure

    San Jose, CA
    3 days ago

Do you want to receive more vacancies?

Subscribe and receive similar vacancies to Principal Site Reliability Engineer. Be the first to apply!