Senior Staff Site Reliability Engineer
$126k - $204.5kPalo Alto Networks
Job Summary The Cortex team builds and delivers the industry’s most advanced SecOps platform, consisting of XDR, XSIAM, XSOAR, and XPANSE. As a member of the Cortex DevOps team, your role involves operating and maintaining a large‑scale GCP environment, including the design, implementation, and continuous enhancement of our comprehensive observability systems. To meet the opportunities that such a role provides, you will have a deep knowledge of modern observability and monitoring tools and practices, having managed high cardinality metrics, implemented tracing, and operationalized large‑scale logging solutions. As part of this role, you will collaborate closely with our engineering teams to develop innovative solutions that provide clear and actionable insights into our systems’ performance and health. Key Responsibilities Utilize expertise in monitoring cloud platforms, particularly GCP, to optimize our infrastructure, leveraging cloud‑native technologies. Improve monitoring processes, alerts, and metrics, and work with development teams to ensure that all of our services have the right monitoring and metrics in place to detect problems before our customers do. Leverage incident management processes to ensure efficient resolution of system issues and minimal impact on services. Automate complex monitoring and alerting tasks by building tools for cloud operations, such as automated remediation of known issues and auto‑scaling. Stay up‑to‑date with cutting‑edge technologies, evaluate their potential impact on our operations, and implement them when appropriate. Provide follow‑the‑sun operational coverage in the production of our Observability infrastructure. Work with our Engineering team to influence the operability of the product and ensure the reliability and availability of our services. Qualifications Required Qualifications 5+ years of experience as a DevOps/SRE engineer with a passion for technology and a strong motivation for high reliability at the service level. High proficiency with Thanos, Prometheus, Grafana, Open Telemetry and other monitoring tools. Clear understanding of incident and alerts management using tools like Pagerduty and Prometheus Alert Manager. High proficiency in either Google Cloud Platform or Amazon Web Services. High proficiency with Kubernetes and Docker for container orchestration. High proficiency in Python programming and Linux Shell commands. Experience with Ansible and Terraform for infrastructure as code. Preferred Qualifications Effective communication and interpersonal skills, with the ability to work and coordinate between multiple teams in different time zones. Ability to effectively troubleshoot and address emerging and complex problems. Ability to operate independently, make decisions, take action, and take responsibility. Compensation Disclosure The compensation offered for this position will depend on qualifications, experience, and work location. For candidates who receive an offer at the posted level, the starting base salary (for non‑sales roles) or base salary + commission target (for sales/commissioned roles) is expected to be the annual range listed below. The offered compensation may also include restricted stock units and a bonus. A description of our employee benefits may be found here. $126,000.00 - $204,500.00/yr Equal Opportunity Employer Palo Alto Networks is an equal opportunity employer. We celebrate diversity in our workplace, and all qualified applicants will receive consideration for employment without regard to age, ancestry, color, family or medical care leave, gender identity or expression, genetic information, marital status, medical condition, national origin, physical or mental disability, political affiliation, protected veteran status, race, religion, sex (including pregnancy), sexual orientation, or other legally protected characteristics. All information will be kept confidential according to EEO guidelines. #J-18808-Ljbffr Palo Alto Networks, Inc.
- ...building an AI Data Center AIOps platform that turns raw, high‑volume telemetry into reliable, job‑centric insights and automation for GPU fleets. Join our team of innovative engineers who are building this platform and operating it (not the compute cluster): uptime, performance...Senior
$152k - $241.5k
...intelligence. Job Overview We’re looking for a Senior SRE to join our Compute Farm team and... ...host lifecycle management, fleet reliability/auto‑healing, E2E observability or data‑... ...Python, Go, Perl, or Ruby. Mentored other engineers and influenced technical direction through...Senior$200k - $322k
Senior Manager, Site Reliability Engineering page is loaded## Senior Manager, Site Reliability Engineeringlocations: US, CA, Santa Claratime type: Full timeposted on: Posted Yesterdayjob requisition id: JR2016119For over 25 years, NVIDIA has been at the forefront of transforming...Senior- Senior Staff Software Engineer, Site Reliability Engineering In accordance with Washington state law, we are highlighting our comprehensive benefits package, which is available to all eligible US based employees. Benefits for this role include: Health, dental, vision,...SeniorTemporary work
$145k - $165k
A technology solutions firm in Sunnyvale, CA is looking for a highly experienced Site Reliability Engineer (SRE). This role involves maintaining uptime and performance across systems. Exceptional Linux expertise and automation skills in Bash and Python are crucial. Key...Senior$176k - $276k
Site Reliability Engineering (SRE) at NVIDIA is an engineering discipline to design, build and maintain large scale production systems with high efficiency and availability using the combination of software and systems engineering practices. This is a highly specialized...Senior$184k - $356.5k
...design validation, and the development of diagnostic test tools. Successful candidates will possess a Bachelor’s degree in Electrical Engineering or Computer Science plus 8+ years of relevant experience. The position offers a competitive salary range of $184,000-$356,500...Senior$230k - $250k
Cerebras Systems in Sunnyvale, CA, seeks a Sr. Member of Technical Staff to develop resilient software for their AI chip. Responsibilities include designing robust software features, maintaining deployment workflows using AWS, and debugging software issues. Candidates should...SeniorRemote job$163.8k - $226.22k
42dot Inc. is seeking a Sr. Staff Technical Project Manager to lead complex projects for software-defined vehicles. This role involves cross-functional collaboration, ensuring technical milestones, and managing vendor relationships. The ideal candidate has over 6 years...Senior$200k - $322k
NVIDIA Gruppe in Santa Clara is seeking a Senior Staff Software Engineer to lead engineering efforts in their enterprise systems. Responsibilities include designing AI-driven workflows, managing enterprise issues with an automation focus, and mentoring team members. The...Senior$184k - $287.5k
Senior System Software Engineer - GPU Server page is loaded## Senior System Software Engineer - GPU Serverlocations: US, CA, Santa Claratime type: Full timeposted on: Posted Todayjob requisition id: JR2001533We are seeking software engineers to work on next-generation graphics...Senior- Overview Staff/Senior Backend Engineer - Sunnyvale, CA. Duration: 6 to 12+ months. Rate: DOE. Responsibilities Provide operations support for backend end-to-end tools. Develop REST APIs and automation solutions. Collaborate with a large backend team (navigate through a...Senior
$180.5k - $270.7k
Qualcomm is seeking an experienced Thermal Engineer to develop high-performance thermal solutions for data center applications in Santa Clara, California. The role involves hands-on lab work, thermal testing, and collaboration with cross-functional teams. The ideal candidate...Senior- A leading technology company in Santa Clara is seeking a Senior System Software Engineer to design and implement microcontroller firmware for GPU server platforms. The ideal candidate will have a Bachelor's degree in Electrical Engineering or Computer Science, along with...Senior
$262k - $365k
Google Inc. is looking for a Senior Staff Software Engineer to join their Cloud team in Sunnyvale, CA. In this role, you will provide technical leadership on high-impact projects while managing project priorities and developing large-scale solutions. The ideal candidate...Senior$152k - $241.5k
We are looking for a Senior System Software Engineer to work on. NVIDIA is hiring software engineers for its GPU-accelerated deep learning software team. Academic and commercial groups around the world are using GPUs to power a revolution in AI, enabling breakthroughs in...Senior$262k - $365k
Google Inc. is seeking a Senior Staff Software Engineer, specializing in Site Reliability Engineering. This role involves leading projects, engaging through the entire lifecycle of services, and ensuring systems remain reliable and efficient. Candidates should have 8 years...Senior- Overview Staff/Senior Mobile QE Automation Engineer - Sunnyvale, CA. Duration: 6 to 12+ Months. Rate: DOE. Job Title Staff/Senior Mobile QE Automation Engineer Location Sunnyvale, CA Description Staff/Senior Staff/Senior Mobile QE Automation Engineer - Mandatory Requirements...Senior
$236k - $330k
Senior Staff Physical Systems Architecture, Google Cloud Google Sunnyvale, CA,... ...Regular development and processing of engineering hardware must be performed on site. Required Qualifications... ...test procedures to ensure desired reliability and performance of electronic equipment...Senior- NVIDIA Corporation is looking for a Senior Systems Software Engineer (SRE) in Santa Clara, California. This role focuses on designing, building,... ...responsibilities include ensuring GPU cloud services run with maximum reliability, participating in service lifecycles, and leveraging...Senior
$233k - $349.6k
Qualcomm is looking for a Server Power Management Architect in Santa Clara for its Data Center team. This role involves designing high-performance, energy-efficient server solutions, requiring over 10 years of experience in power management, particularly with high-performance...Senior- ...join our small team focused on growth and productivity. The role involves scaling our platform and infrastructure while enhancing reliability and the overall developer experience. Ideal candidates will have strong expertise in distributed systems, cloud-native...SeniorRemote job
- A multinational semiconductor company based in California is seeking a Fellow Server CPU Validation Architect. This role involves driving the CPU validation strategy, engaging with technical leaders on next-generation technologies, and ensuring effective execution of validation...Senior
$207k - $300k
Google Inc. is looking for a Staff Software Engineer specializing in Site Reliability Engineering in Sunnyvale, CA. This role combines software and systems engineering to build and manage distributed systems, ensuring high reliability and uptime. The ideal candidate should...Senior- Qualcomm in Santa Clara is seeking a highly experienced Server Product Architect to define the architecture for a Server SoC that meets critical customer KPIs. This role involves collaborating with architects, developing a server roadmap, and analytical modeling of server...Senior
$55 - $57 per hour
Position: Senior Staff Analyst, Sourcing - Specialized Individual Contributor Location: Onsite - San Jose, CA Pay Range: $55 - $57/hr Job... ...orders for the procurement of technical equipment, custom engineered products, systems, software or components. Acquires and maintains...SeniorLong term contract- NVIDIA Corporation is hiring a Senior System Engineer in Santa Clara, California. In this role, you will ensure functionality and validation of GPU rack platforms before mass deployment. Collaborating with global teams, you will optimize hardware and debug early server...Senior
- Advanced Micro Devices is seeking a Fellow for Post-Silicon Validation Architecture in Santa Clara, CA. This role demands leadership in CPU validation strategies and engagement with design teams to ensure quality and efficiency in server product validations. The ideal candidate...Senior
- ...blocks in design. Includes both RTL power estimation and Physical design power estimation of the blocks. Work with front-end and DV engineers to identify windows of power activity in the design. Work with the RTL team to ensure feedback from the estimation is implemented...Senior3 days per week
$224k - $431.25k
NVIDIA Gruppe is seeking a senior firmware engineer to design and implement microcontroller firmware for GPU server platforms. The role requires over 12 years of experience in low-level microcontroller firmware development with a strong focus on embedded systems and C/...Senior
Do you want to receive more vacancies?
Subscribe and receive similar vacancies to Senior Staff Site Reliability Engineer. Be the first to apply!
- staff engineer Santa Clara, CA
- senior staff systems engineer Santa Clara, CA
- engineering aide Santa Clara, CA
- software engineer staff Santa Clara, CA
- assistant engineer Santa Clara, CA
- technology administrator Santa Clara, CA
- senior staff engineer Santa Clara, CA
- senior data management analyst Santa Clara, CA
- senior app developer Santa Clara, CA
- senior game producer Santa Clara, CA
