Senior Staff Site Reliability Engineer

$126k - $204.5k

Palo Alto Networks

Job Summary The Cortex team builds and delivers the industry’s most advanced SecOps platform, consisting of XDR, XSIAM, XSOAR, and XPANSE. As a member of the Cortex DevOps team, your role involves operating and maintaining a large‑scale GCP environment, including the design, implementation, and continuous enhancement of our comprehensive observability systems. To meet the opportunities that such a role provides, you will have a deep knowledge of modern observability and monitoring tools and practices, having managed high cardinality metrics, implemented tracing, and operationalized large‑scale logging solutions. As part of this role, you will collaborate closely with our engineering teams to develop innovative solutions that provide clear and actionable insights into our systems’ performance and health. Key Responsibilities Utilize expertise in monitoring cloud platforms, particularly GCP, to optimize our infrastructure, leveraging cloud‑native technologies. Improve monitoring processes, alerts, and metrics, and work with development teams to ensure that all of our services have the right monitoring and metrics in place to detect problems before our customers do. Leverage incident management processes to ensure efficient resolution of system issues and minimal impact on services. Automate complex monitoring and alerting tasks by building tools for cloud operations, such as automated remediation of known issues and auto‑scaling. Stay up‑to‑date with cutting‑edge technologies, evaluate their potential impact on our operations, and implement them when appropriate. Provide follow‑the‑sun operational coverage in the production of our Observability infrastructure. Work with our Engineering team to influence the operability of the product and ensure the reliability and availability of our services. Qualifications Required Qualifications 5+ years of experience as a DevOps/SRE engineer with a passion for technology and a strong motivation for high reliability at the service level. High proficiency with Thanos, Prometheus, Grafana, Open Telemetry and other monitoring tools. Clear understanding of incident and alerts management using tools like Pagerduty and Prometheus Alert Manager. High proficiency in either Google Cloud Platform or Amazon Web Services. High proficiency with Kubernetes and Docker for container orchestration. High proficiency in Python programming and Linux Shell commands. Experience with Ansible and Terraform for infrastructure as code. Preferred Qualifications Effective communication and interpersonal skills, with the ability to work and coordinate between multiple teams in different time zones. Ability to effectively troubleshoot and address emerging and complex problems. Ability to operate independently, make decisions, take action, and take responsibility. Compensation Disclosure The compensation offered for this position will depend on qualifications, experience, and work location. For candidates who receive an offer at the posted level, the starting base salary (for non‑sales roles) or base salary + commission target (for sales/commissioned roles) is expected to be the annual range listed below. The offered compensation may also include restricted stock units and a bonus. A description of our employee benefits may be found here. $126,000.00 - $204,500.00/yr Equal Opportunity Employer Palo Alto Networks is an equal opportunity employer. We celebrate diversity in our workplace, and all qualified applicants will receive consideration for employment without regard to age, ancestry, color, family or medical care leave, gender identity or expression, genetic information, marital status, medical condition, national origin, physical or mental disability, political affiliation, protected veteran status, race, religion, sex (including pregnancy), sexual orientation, or other legally protected characteristics. All information will be kept confidential according to EEO guidelines. #J-18808-Ljbffr Palo Alto Networks, Inc.

Apply

Vacancy posted 5 days ago

Similar jobs that could be interesting for youBased on the Senior Staff Site Reliability Engineer in Santa Clara, CA vacancy

Senior Site Reliability Engineer, AIOPs
...building an AI Data Center AIOps platform that turns raw, high‑volume telemetry into reliable, job‑centric insights and automation for GPU fleets. Join our team of innovative engineers who are building this platform and operating it (not the compute cluster): uptime, performance...
Senior
NVIDIA Gruppe
Santa Clara, CA
2 days ago
Senior Site Reliability Engineer - HPC
$152k - $241.5k
...intelligence. Job Overview We’re looking for a Senior SRE to join our Compute Farm team and... ...host lifecycle management, fleet reliability/auto‑healing, E2E observability or data‑... ...Python, Go, Perl, or Ruby. Mentored other engineers and influenced technical direction through...
Senior
NVIDIA Gruppe
Santa Clara, CA
5 days ago
Senior Manager, Site Reliability Engineering
$200k - $322k
Senior Manager, Site Reliability Engineering page is loaded## Senior Manager, Site Reliability Engineeringlocations: US, CA, Santa Claratime type: Full timeposted on: Posted Yesterdayjob requisition id: JR2016119For over 25 years, NVIDIA has been at the forefront of transforming...
Senior
NVIDIA Corporation
Santa Clara, CA
4 days ago
Senior Staff Software Engineer, Site Reliability Engineering
Senior Staff Software Engineer, Site Reliability Engineering In accordance with Washington state law, we are highlighting our comprehensive benefits package, which is available to all eligible US based employees. Benefits for this role include: Health, dental, vision,...
Senior
Temporary work
Google Inc.
Sunnyvale, CA
1 day ago
Senior Site Reliability Engineer — Scale, Automation & Uptime
$145k - $165k
A technology solutions firm in Sunnyvale, CA is looking for a highly experienced Site Reliability Engineer (SRE). This role involves maintaining uptime and performance across systems. Exceptional Linux expertise and automation skills in Bash and Python are crucial. Key...
Senior
Bolt Graphics, Inc.
Sunnyvale, CA
4 days ago
Senior Site Reliability Engineer - Observability and Telemetry Platform
$176k - $276k
Site Reliability Engineering (SRE) at NVIDIA is an engineering discipline to design, build and maintain large scale production systems with high efficiency and availability using the combination of software and systems engineering practices. This is a highly specialized...
Senior
NVIDIA Corporation
Santa Clara, CA
3 days ago
Senior Systems Software Engineer — AI Server & Firmware
$184k - $356.5k
...design validation, and the development of diagnostic test tools. Successful candidates will possess a Bachelor’s degree in Electrical Engineering or Computer Science plus 8+ years of relevant experience. The position offers a competitive salary range of $184,000-$356,500...
Senior
NVIDIA Gruppe
Santa Clara, CA
5 days ago
Senior Staff, AI Inference Systems (Remote)
$230k - $250k
Cerebras Systems in Sunnyvale, CA, seeks a Sr. Member of Technical Staff to develop resilient software for their AI chip. Responsibilities include designing robust software features, maintaining deployment workflows using AWS, and debugging software issues. Candidates should...
Senior
Remote job
Cerebras
Sunnyvale, CA
1 day ago
Senior Staff TPM for SDV Platforms & AI Mobility
$163.8k - $226.22k
42dot Inc. is seeking a Sr. Staff Technical Project Manager to lead complex projects for software-defined vehicles. This role involves cross-functional collaboration, ensuring technical milestones, and managing vendor relationships. The ideal candidate has over 6 years...
Senior
42dot Inc.
Sunnyvale, CA
2 days ago
Senior Staff SRE: Agentic AI Automation for Enterprise
$200k - $322k
NVIDIA Gruppe in Santa Clara is seeking a Senior Staff Software Engineer to lead engineering efforts in their enterprise systems. Responsibilities include designing AI-driven workflows, managing enterprise issues with an automation focus, and mentoring team members. The...
Senior
NVIDIA Gruppe
Santa Clara, CA
5 days ago
Senior System Software Engineer - GPU Server
$184k - $287.5k
Senior System Software Engineer - GPU Server page is loaded## Senior System Software Engineer - GPU Serverlocations: US, CA, Santa Claratime type: Full timeposted on: Posted Todayjob requisition id: JR2001533We are seeking software engineers to work on next-generation graphics...
Senior
NVIDIA Corporation
Santa Clara, CA
5 days ago
Staff/Senior Staff/Senior Backend Engineer
Overview Staff/Senior Backend Engineer - Sunnyvale, CA. Duration: 6 to 12+ months. Rate: DOE. Responsibilities Provide operations support for backend end-to-end tools. Develop REST APIs and automation solutions. Collaborate with a large backend team (navigate through a...
Senior
Redolent Infotech Pvt. Ltd.
Sunnyvale, CA
3 days ago
Senior Thermal Engineer, Server Platform Design
$180.5k - $270.7k
Qualcomm is seeking an experienced Thermal Engineer to develop high-performance thermal solutions for data center applications in Santa Clara, California. The role involves hands-on lab work, thermal testing, and collaboration with cross-functional teams. The ideal candidate...
Senior
Qualcomm
Santa Clara, CA
5 days ago
Senior Embedded Firmware Engineer, Server Platform Equity
A leading technology company in Santa Clara is seeking a Senior System Software Engineer to design and implement microcontroller firmware for GPU server platforms. The ideal candidate will have a Bachelor's degree in Electrical Engineering or Computer Science, along with...
Senior
NVIDIA
Santa Clara, CA
3 days ago
Senior Staff Cloud Networking Software Architect
$262k - $365k
Google Inc. is looking for a Senior Staff Software Engineer to join their Cloud team in Sunnyvale, CA. In this role, you will provide technical leadership on high-impact projects while managing project priorities and developing large-scale solutions. The ideal candidate...
Senior
Google Inc.
Sunnyvale, CA
3 days ago
Senior System Software Engineer - Dynamo-Triton Inference Server
$152k - $241.5k
We are looking for a Senior System Software Engineer to work on. NVIDIA is hiring software engineers for its GPU-accelerated deep learning software team. Academic and commercial groups around the world are using GPUs to power a revolution in AI, enabling breakthroughs in...
Senior
NVIDIA Corporation
Santa Clara, CA
5 days ago
Senior Staff SRE: Lead Reliability & Distributed Systems
$262k - $365k
Google Inc. is seeking a Senior Staff Software Engineer, specializing in Site Reliability Engineering. This role involves leading projects, engaging through the entire lifecycle of services, and ensuring systems remain reliable and efficient. Candidates should have 8 years...
Senior
Google Inc.
Sunnyvale, CA
1 day ago
Staff/Senior Mobile QE Automation Engineer
Overview Staff/Senior Mobile QE Automation Engineer - Sunnyvale, CA. Duration: 6 to 12+ Months. Rate: DOE. Job Title Staff/Senior Mobile QE Automation Engineer Location Sunnyvale, CA Description Staff/Senior Staff/Senior Mobile QE Automation Engineer - Mandatory Requirements...
Senior
Redolent Infotech Pvt. Ltd.
Sunnyvale, CA
3 days ago
Senior Staff Physical Systems Architecture, Google Cloud
$236k - $330k
Senior Staff Physical Systems Architecture, Google Cloud Google Sunnyvale, CA,... ...Regular development and processing of engineering hardware must be performed on site. Required Qualifications... ...test procedures to ensure desired reliability and performance of electronic equipment...
Senior
Google Inc.
Sunnyvale, CA
3 days ago
Senior SRE: Scalable Systems & Observability Engineer
NVIDIA Corporation is looking for a Senior Systems Software Engineer (SRE) in Santa Clara, California. This role focuses on designing, building,... ...responsibilities include ensuring GPU cloud services run with maximum reliability, participating in service lifecycles, and leveraging...
Senior
NVIDIA Corporation
Santa Clara, CA
3 days ago
Senior Server Power Architect for Hyperscale Data Centers
$233k - $349.6k
Qualcomm is looking for a Server Power Management Architect in Santa Clara for its Data Center team. This role involves designing high-performance, energy-efficient server solutions, requiring over 10 years of experience in power management, particularly with high-performance...
Senior
Jobleads-US
Santa Clara, CA
5 days ago
Senior Site Reliability Engineer - Remote & Scalable Impact
...join our small team focused on growth and productivity. The role involves scaling our platform and infrastructure while enhancing reliability and the overall developer experience. Ideal candidates will have strong expertise in distributed systems, cloud-native...
Senior
Remote job
BuildBuddy
Palo Alto, CA
2 days ago
Senior architect - server architecture
A multinational semiconductor company based in California is seeking a Fellow Server CPU Validation Architect. This role involves driving the CPU validation strategy, engaging with technical leaders on next-generation technologies, and ensuring effective execution of validation...
Senior
Advanced Micro Devices
Santa Clara, CA
1 day ago
Senior SRE Software Engineer - Reliability & Scale
$207k - $300k
Google Inc. is looking for a Staff Software Engineer specializing in Site Reliability Engineering in Sunnyvale, CA. This role combines software and systems engineering to build and manage distributed systems, ensuring high reliability and uptime. The ideal candidate should...
Senior
Google Inc.
Sunnyvale, CA
4 days ago
Senior Server SoC Architect & Product Strategy Lead
Qualcomm in Santa Clara is seeking a highly experienced Server Product Architect to define the architecture for a Server SoC that meets critical customer KPIs. This role involves collaborating with architects, developing a server roadmap, and analytical modeling of server...
Senior
Jobleads-US
Santa Clara, CA
4 days ago
Senior Staff Analyst, Sourcing
$55 - $57 per hour
Position: Senior Staff Analyst, Sourcing - Specialized Individual Contributor Location: Onsite - San Jose, CA Pay Range: $55 - $57/hr Job... ...orders for the procurement of technical equipment, custom engineered products, systems, software or components. Acquires and maintains...
Senior
Long term contract
Dexian
Santa Clara, CA
2 days ago
Senior Data Center Systems Engineer - AI Server Bring-Up
NVIDIA Corporation is hiring a Senior System Engineer in Santa Clara, California. In this role, you will ensure functionality and validation of GPU rack platforms before mass deployment. Collaborating with global teams, you will optimize hardware and debug early server...
Senior
NVIDIA Corporation
Santa Clara, CA
5 days ago
Senior architect - server architecture
Advanced Micro Devices is seeking a Fellow for Post-Silicon Validation Architecture in Santa Clara, CA. This role demands leadership in CPU validation strategies and engagement with design teams to ensure quality and efficiency in server product validations. The ideal candidate...
Senior
Advanced Micro Devices
Santa Clara, CA
1 day ago
Power Performance Architect, Senior Staff - Accelerator Design
...blocks in design. Includes both RTL power estimation and Physical design power estimation of the blocks. Work with front-end and DV engineers to identify windows of power activity in the design. Work with the RTL team to ensure feedback from the estimation is implemented...
Senior
3 days per week
d-Matrix inc.
Santa Clara, CA
3 days ago
Senior Embedded Firmware Architect — Server Management
$224k - $431.25k
NVIDIA Gruppe is seeking a senior firmware engineer to design and implement microcontroller firmware for GPU server platforms. The role requires over 12 years of experience in low-level microcontroller firmware development with a strong focus on embedded systems and C/...
Senior
NVIDIA Gruppe
Santa Clara, CA
4 days ago

Do you want to receive more vacancies?

Subscribe and receive similar vacancies to Senior Staff Site Reliability Engineer. Be the first to apply!