Senior Staff Site Reliability Engineer
$126k - $204.5kPalo Alto Networks, Inc.
Job Summary The Cortex team builds and delivers the industry’s most advanced SecOps platform, consisting of XDR, XSIAM, XSOAR, and XPANSE. As a member of the Cortex DevOps team, your role involves operating and maintaining a large‑scale GCP environment, including the design, implementation, and continuous enhancement of our comprehensive observability systems. To meet the opportunities that such a role provides, you will have a deep knowledge of modern observability and monitoring tools and practices, having managed high cardinality metrics, implemented tracing, and operationalized large‑scale logging solutions. As part of this role, you will collaborate closely with our engineering teams to develop innovative solutions that provide clear and actionable insights into our systems’ performance and health. Key Responsibilities Utilize expertise in monitoring cloud platforms, particularly GCP, to optimize our infrastructure, leveraging cloud‑native technologies. Improve monitoring processes, alerts, and metrics, and work with development teams to ensure that all of our services have the right monitoring and metrics in place to detect problems before our customers do. Leverage incident management processes to ensure efficient resolution of system issues and minimal impact on services. Automate complex monitoring and alerting tasks by building tools for cloud operations, such as automated remediation of known issues and auto‑scaling. Stay up‑to‑date with cutting‑edge technologies, evaluate their potential impact on our operations, and implement them when appropriate. Provide follow‑the‑sun operational coverage in the production of our Observability infrastructure. Work with our Engineering team to influence the operability of the product and ensure the reliability and availability of our services. Qualifications Required Qualifications 5+ years of experience as a DevOps/SRE engineer with a passion for technology and a strong motivation for high reliability at the service level. High proficiency with Thanos, Prometheus, Grafana, Open Telemetry and other monitoring tools. Clear understanding of incident and alerts management using tools like Pagerduty and Prometheus Alert Manager. High proficiency in either Google Cloud Platform or Amazon Web Services. High proficiency with Kubernetes and Docker for container orchestration. High proficiency in Python programming and Linux Shell commands. Experience with Ansible and Terraform for infrastructure as code. Preferred Qualifications Effective communication and interpersonal skills, with the ability to work and coordinate between multiple teams in different time zones. Ability to effectively troubleshoot and address emerging and complex problems. Ability to operate independently, make decisions, take action, and take responsibility. Compensation Disclosure The compensation offered for this position will depend on qualifications, experience, and work location. For candidates who receive an offer at the posted level, the starting base salary (for non‑sales roles) or base salary + commission target (for sales/commissioned roles) is expected to be the annual range listed below. The offered compensation may also include restricted stock units and a bonus. A description of our employee benefits may be found here. $126,000.00 - $204,500.00/yr Equal Opportunity Employer Palo Alto Networks is an equal opportunity employer. We celebrate diversity in our workplace, and all qualified applicants will receive consideration for employment without regard to age, ancestry, color, family or medical care leave, gender identity or expression, genetic information, marital status, medical condition, national origin, physical or mental disability, political affiliation, protected veteran status, race, religion, sex (including pregnancy), sexual orientation, or other legally protected characteristics. All information will be kept confidential according to EEO guidelines. #J-18808-Ljbffr Palo Alto Networks, Inc.
$200k - $322k
...supportive environment, where NVIDIANs are inspired to excel and make a profound global impact. NVIDIA is seeking a Senior Manager of Site Reliability Engineering to lead and reshape how IT operations function at scale. This role goes beyond traditional service management...Senior$152k - $241.5k
...intelligence. Job Overview We’re looking for a Senior SRE to join our Compute Farm team and... ...host lifecycle management, fleet reliability/auto‑healing, E2E observability or data‑... ...Python, Go, Perl, or Ruby. Mentored other engineers and influenced technical direction through...Senior$176k - $276k
Site Reliability Engineering (SRE) at NVIDIA is an engineering discipline to design, build and maintain large scale production systems with high efficiency and availability using the combination of software and systems engineering practices. This is a highly specialized...Senior$168k - $270.25k
NVIDIA is looking for a Senior Site Reliability Engineer (SRE) to join its GeForce Now (GFN) team. SRE at NVIDIA ensures that our internal and external-facing GPU cloud gaming services have reliability and uptime as promised to the users and at the same time enables developers...SeniorFull time$184k - $287.5k
...We are seeking software engineers to work on next-generation graphics and computing products . Our charter is to develop the most demanding applications a GPU or high-performance computing server will encounter in its lifecycle, by collaborating closely with customers...Senior$152k - $241.5k
...We are looking for a Senior System Software Engineer to work on Dynamo-Triton Inference Server ( . NVIDIA is hiring software engineers for its GPU-accelerated deep learning software team. Academic and commercial groups around the world are using GPUs to power a revolution...Senior$230k - $250k
Cerebras Systems in Sunnyvale, CA, seeks a Sr. Member of Technical Staff to develop resilient software for their AI chip. Responsibilities include designing robust software features, maintaining deployment workflows using AWS, and debugging software issues. Candidates should...SeniorRemote job$106.1k - $187.11k
Description: Primary responsibilities: • Develop and support communications strategies for enterprise strategic initiatives and thought leadership plans, including 21st Century Security priorities. • Support enterprise corporate media relations efforts and coordination...SeniorFull timeTemporary workPart timeWork experience placementWork at officeRemote workRelocationFlexible hoursShift work3 days per week- NVIDIA Corporation is seeking a Senior Staff Engineer for Enterprise Messaging Platforms to manage and enhance their global email and messaging infrastructure. This role involves architecting solutions with Microsoft Exchange and Azure services, ensuring high availability...Senior
$200k - $322k
NVIDIA Gruppe in Santa Clara is seeking a Senior Staff Software Engineer to lead engineering efforts in their enterprise systems. Responsibilities include designing AI-driven workflows, managing enterprise issues with an automation focus, and mentoring team members. The...Senior$180.5k - $270.7k
Qualcomm is seeking an experienced Thermal Engineer to develop high-performance thermal solutions for data center applications in Santa Clara, California. The role involves hands-on lab work, thermal testing, and collaboration with cross-functional teams. The ideal candidate...Senior- Overview Staff/Senior Backend Engineer - Sunnyvale, CA. Duration: 6 to 12+ months. Rate: DOE. Responsibilities Provide operations support for backend end-to-end tools. Develop REST APIs and automation solutions. Collaborate with a large backend team (navigate through a...Senior
$180k - $200k
...Holmdel, NJ. Join us and be part of a team that's shaping the future of payments—one experience at a time. As our Site Reliability Engineer, you will design, build, and maintain the systems and infrastructure that power our applications, ensuring their...SeniorFor contractorsWork at officeWork from homeFlexible hours- Google Inc. is seeking a Senior Staff Software Engineer for Kernel Security and Virtualization in Sunnyvale, CA. This role focuses on securing the core of Google’s production platform and developing next-generation isolation frameworks. Ideal candidates will possess strong...Senior
- A multinational semiconductor company based in California is seeking a Fellow Server CPU Validation Architect. This role involves driving the CPU validation strategy, engaging with technical leaders on next-generation technologies, and ensuring effective execution of validation...Senior
$207k - $300k
Google Inc. is looking for a Staff Software Engineer specializing in Site Reliability Engineering in Sunnyvale, CA. This role combines software and systems engineering to build and manage distributed systems, ensuring high reliability and uptime. The ideal candidate should...Senior$203.45k - $344.3k
...Senior Staff Physical AI Data Algorithm Engineer Santa Clara, CA XPENG is a leading smart technology company at the forefront of innovation, integrating advanced AI and autonomous driving technologies into its vehicles, including electric vehicles (EVs), electric...SeniorFull timeTemporary workWork experience placement$230k - $240k
...Senior Staff Software Architect- Cloud Data Platform & Industrial SaaS... ...Reporting to: SVP, Engineering Role Type: Enterprise Platform... ...team lunches, etc.) ~ On-site Health & Wellness programs (... ...precision, ease of use, and reliability. At Picarro, we are committed...SeniorFull timeTemporary workSummer holidayWorldwideFlexible hours$272k
...with skilled teams to drive quality and speed in product development. The ideal candidate has a strong educational background in engineering and expertise in C/C++, Python, and data center management tools. Salary ranges from $272,000 to $488,750 based on experience. #J...Senior- ...semiconductor or networking industry, and a strong grasp of customer needs from cloud providers and enterprises. You will collaborate with engineering teams to define project requirements and address complex challenges. NVIDIA offers competitive salaries along with comprehensive...Senior
$174k - $252k
A leading tech company is seeking a Senior Software Engineer for Site Reliability Engineering based in Sunnyvale, CA. The role involves ensuring service reliability, leading technical projects, and enhancing systems performance. Candidates should have at least 5 years of...Senior- ...teams, crafting robust validation plans for new server generations. Analyze root causes of complex failures, acting as a Level 2 engineering contact for critical issues and offering scalable solutions across the stack. Develop diagnostics software to ensure quality and...Senior
$210k - $270k
...deeply thoughtful, driven, and collaborative teammates, read on. Your Impact on our Mission: Zocdoc is looking for a Senior Site Reliability Engineer to help develop, monitor, and maintain our distributed production systems. You’ll be challenged with building frameworks...SeniorFlexible hours$208k - $327.75k
...frontiers of networking technology to compose solutions that exceed customer expectations. Collaborate deeply with multidisciplinary engineering and architecture teams to prioritize and define detailed, actionable requirements for breakthrough projects. Forge positive...Senior$227k - $320k
Senior Staff Architect, Silicon, Google Cloud Sunnyvale, CA, USA Apply Bachelor’s degree in Electrical Engineering, Computer Science, a related field, or equivalent practical experience... ...performance, power, and cost while enhancing reliability. You will weave your work into the...SeniorFull timeWorldwide- A leading technology company is looking for a Java SRE Engineer to support large-scale cloud migrations and production systems on AWS... ...mentoring team members and collaborating with various teams to ensure reliability. This position is onsite in the San Francisco Bay Area. #J-188...Senior
- Zocdoc, located in Silicon Valley, CA, is seeking a Senior Site Reliability Engineer to monitor and maintain cloud-based systems ensuring uptime for millions of patients. You'll work with cutting-edge technology in a diverse and collaborative environment. This role requires...Senior
$224k - $431.25k
NVIDIA Gruppe in Santa Clara is looking for a skilled engineer to develop diagnostic systems for data center platforms. You'll lead platform integration and analyze failures to develop scalable solutions in collaboration with multi-disciplinary teams. The ideal candidate...Senior- A leading tech recruiting firm is seeking a Site Reliability Engineer to manage and optimize cloud infrastructure primarily using GCP or AWS. The role involves maintaining high availability through Kubernetes clusters and improving CI/CD pipelines with Terraform. Ideal...Senior
$140k - $224.25k
...have enterprise server integration, strong Linux experience, reliability testing with various telemetries, scale out cluster, test plan... ...(or equivalent experience) in a STEM (Science, Technology, Engineering, Math or Physics) field ~5+ years proven experience; or master...Senior
Do you want to receive more vacancies?
Subscribe and receive similar vacancies to Senior Staff Site Reliability Engineer. Be the first to apply!
- engineering aide Santa Clara, CA
- software engineer staff Santa Clara, CA
- technology administrator Santa Clara, CA
- staff engineer Santa Clara, CA
- senior staff engineer Santa Clara, CA
- assistant engineer Santa Clara, CA
- senior staff systems engineer Santa Clara, CA
- senior game producer Santa Clara, CA
- senior manager process engineering Santa Clara, CA
- senior manufacturing engineer Santa Clara, CA

