Staff Site Reliability Engineer, Cloud Reliability Intelligence

$207k - $301k

Google Inc.

Staff Site Reliability Engineer, Cloud Reliability Intelligence Google Sunnyvale, CA, USA Apply Qualifications Bachelor's degree in Computer Science or a related technical field or equivalent practical experience. 8 years of experience with data structures and algorithms. 3 years of experience leading projects and designing, analyzing, and troubleshooting distributed systems. 3 years of experience in a technical leadership role; overseeing projects. Experience overseeing full-stack architectures, ensuring cohesion between backend data automation layers and engineering frontend. Preferred Qualifications Experience in applying LLMs or Generative AI to automate workflows. Familiarity with large-scale reliability analysis, or policy conformance frameworks. About the job Site Reliability Engineering (SRE) combines software and systems engineering to build and run large-scale, massively distributed, fault‑tolerant systems. SRE ensures that Google Cloud's services—both our internally critical and our externally‑visible systems—have reliability, uptime appropriate to customer's needs and a fast rate of improvement. Additionally SRE’s will keep an ever‑watchful eye on our systems capacity and performance. Much of our software development focuses on optimizing existing systems, building infrastructure and eliminating work through automation. On the SRE team, you’ll have the opportunity to manage the complex challenges of scale which are unique to Google Cloud, while using your expertise in coding, algorithms, complexity analysis and large‑scale system design. SRE’s culture of intellectual curiosity, problem solving and openness is key to its success. Our organization brings together people with a wide variety of backgrounds, experiences and perspectives. We encourage them to collaborate, think big and take risks in a blame‑free environment. We promote self‑direction to work on meaningful projects, while we also strive to create an environment that provides the support and mentorship needed to learn and grow. The Reliability Outcome Enablement team develops the products, core infrastructure, and datasets that drive and sustain Google Cloud platform's (GCP's) reliability promises. We build the evergreen intelligence platform the core system that automates resilience across the GCP ecosystem. Every product team at Google (from BigQuery to Spanner) relies on our infrastructure and integrated data lake to keep their services bulletproof. We are currently expanding our platform to integrate Generative AI and LLM‑driven workflows, moving from reactive tracking to a predictive system that catches failures and automates risk mitigation. Behind everything our users see online is the architecture built by the Technical Infrastructure team to keep it running. From developing and maintaining our data centers to building the next generation of Google platforms, we make Google's product portfolio possible. We're proud to be our engineers' engineers and love voiding warranties by taking things apart so we can rebuild them. We keep our networks up and running, ensuring our users have the best and fastest experience possible. Individual pay is determined by factors including job‑related skills, experience, and relevant education or training. US: USD 207,000 – 301,000 + 20% bonus target + equity + benefits Learn more about benefits at Google. Own the technical roadmap and long‑term architecture for the Evergreen platform, including a unified data model for promise delivery across GCP. Design and scale high‑performance backend pipelines (Go, Java) and data‑rich user interfaces (TypeScript, Angular) used by over 10,000+ Google engineers. Prototype and productionize LLM‑based features to parse unstructured incident data, automatically file risk tickets, and suggest reliability fixes. Partner closely with Product Management, Data Science, and leadership to align multiple organizations on a unified approach to policy measurement and enforcement. Google is proud to be an equal opportunity and affirmative action employer. We are committed to building a workforce that is representative of the users we serve, creating a culture of belonging, and providing an equal employment opportunity regardless of race, creed, color, religion, gender, sexual orientation, gender identity/expression, national origin, disability, age, genetic information, veteran status, marital status, pregnancy or related condition (including breastfeeding), expecting or parents‑to‑be, criminal histories consistent with legal requirements, or any other basis protected by law. See also Google's EEO Policy, Know your rights: workplace discrimination is illegal, Belonging at Google, and How we hire. Google is a global company and, in order to facilitate efficient collaboration and communication globally, English proficiency is a requirement for all roles unless stated otherwise in the job posting. To all recruitment agencies: Google does not accept agency resumes. Please do not forward resumes to our jobs alias, Google employees, or any other organization location. Google is not responsible for any fees related to unsolicited resumes. Equity is granted exclusively and discretionarily by Alphabet Inc. on the basis of an agreement concluded between you and Alphabet Inc. Alphabet Inc. is your sole contractual partner with respect to equity grants. GSU grants are not guaranteed, are discretionary, are subject to approval by the Alphabet Inc. board of directors or its delegate, the terms of the relevant Alphabet Inc. stock plan, and your grant agreement. They have no impact on statutory payments. Current or past grants do not confer an acquired right. #J-18808-Ljbffr Google Inc.

Apply

Vacancy posted 3 days ago

Similar jobs that could be interesting for youBased on the Staff Site Reliability Engineer, Cloud Reliability Intelligence in Sunnyvale, CA vacancy

Staff Site Reliability Engineer - Automation and Platform
...GPU-based hyperscale cloud inference services. This... ...and increasing intelligence via additional agentic... ...powered by the Wafer‑Scale Engine (WSE). This team will... ...deliver world‑class, ultra‑reliable inference... ...frontier labs. As a Staff SRE, you will lead the...
Intelligence
Shift work
Cerebras Systems, Inc.
Sunnyvale, CA
1 day ago
Senior Site Reliability Engineer
$148k - $235.75k
...will be working as a Senior SRE Engineer. The position will be part of a... ...Processors, Deep Learning, Artificial Intelligence and Driverless Cars to cater to... .... Maintain uptime, reliability and readiness of on-prem engineering cloud spread across multiple data centers...
Intelligence
Remote work
NVIDIA
Santa Clara, CA
3 days ago
Principal Site Reliability Engineer
$185k - $278k
...central to the Era of Pervasive Intelligence, from self-driving cars to... .... * Debugging OS and engineering issues within our provided Linux... ...Have: * Enhancing the reliability and performance of our engineering... ..., Large scale private cloud implementation as well as GPU...
Intelligence
Remote work
Synopsys
Sunnyvale, CA
2 days ago
Senior Site Reliability Engineer - HPC
$152k - $241.5k
...developments in Artificial Intelligence, High-Performance Computing... ...globally distributed, multi‑cloud hybrid environment - On‑prem... ...lifecycle management, fleet reliability/auto‑healing, E2E observability... ..., or Ruby. Mentored other engineers and influenced technical direction...
Intelligence
NVIDIA Gruppe
Santa Clara, CA
3 days ago
Site Reliability Engineer
$65 - $85 per hour
...for over 25 years. We are looking for a Site Reliability Engineer to support our client's team based out... ...Processors, Deep Learning, Artificial Intelligence and Driverless Cars to cater to their infrastructure needs. These cloud services provide almost half a million...
Intelligence
Full time
Contract work
Worldwide
Sustainable Talent
Santa Clara, CA
15 hours ago
Staff SRE: Cloud Reliability Intelligence & AI Automation
Google Inc. is seeking a Staff Site Reliability Engineer in Sunnyvale, CA, who will manage the technical roadmap for the Evergreen platform and optimize... ...teams and a focus on reliability and efficiency in Google Cloud services, leveraging large-scale architecture knowledge...
Intelligence
Google Inc.
Sunnyvale, CA
4 days ago
Site Reliability Engineer (SRE)
$170k - $230k
...Site Reliability Engineer (SRE) Palo Alto / San Francisco Bay Area About Mithril Mithril is... ...Most Innovative Company in Artificial Intelligence for 2026. Our engineering team is... ...platform across a heterogeneous, multi-cloud environment. About the Opportunity...
Intelligence
Work at office
Local area
1 day per week
Mithril
Palo Alto, CA
4 days ago
Staff Site Reliability Engineer
$175k - $250k
...Staff Site Reliability Engineer Figure is an AI robotics company developing autonomous general-purpose... ...humanoid robots with human level intelligence. Its robots are engineered to perform... ...responsible for setting up and managing cloud and on-prem infrastructure to...
Intelligence
Full time
Figure
San Jose, CA
2 days ago
Site Reliability Engineer
...Machine Learning (ML) and Artificial Intelligence (AI) technologies. Our mission is to... ...data centers. We are seeking a skilled Site Reliability Engineer to join our growing team. The ideal... ...and performance of our hybrid-based (Cloud & On-Prem) platform while supporting...
Intelligence
Work at office
Weekend work
FLUIX
Palo Alto, CA
5 days ago
Site Reliability Engineer: Platform & Observability
A leading technology company is seeking a Site Reliability Engineer in Cupertino, California. The role involves owning the reliability of AWS and Kubernetes services, designing systems, and collaborating with engineering teams for observability and automation. Candidates...
Apple Inc.
Cupertino, CA
5 days ago
Senior Site Reliability Engineer - Observability and Telemetry Platform
$176k - $276k
Site Reliability Engineering (SRE) at NVIDIA is an engineering discipline to design, build and maintain large scale production systems with high... ...management, continuous delivery and deployment and open source cloud enabling technologies like Kubernetes and OpenStack. SRE...
NVIDIA Corporation
Santa Clara, CA
1 day ago
Senior Site Reliability Engineer: Cloud, Kubernetes & CI/CD
A leading tech recruiting firm is seeking a Site Reliability Engineer to manage and optimize cloud infrastructure primarily using GCP or AWS. The role involves maintaining high availability through Kubernetes clusters and improving CI/CD pipelines with Terraform. Ideal...
Amiri Recruiting
Mountain View, CA
5 days ago
Sr Site Reliability Engineer (Internet Security Platform)
$120.3k - $194.53k
...that drives great outcomes. Job Summary Palo Alto Networks runs a large hybrid infrastructure across multiple public clouds. As a Site Reliability Engineer on the Internet Security Platform team, you will be part of a team supporting Advanced DNS Security services. This...
Full time
Work at office
Visa sponsorship
Work visa
Palo Alto Networks, Inc.
Santa Clara, CA
1 day ago
Senior Site Reliability Engineer
$83k - $187k
...Senior Site Reliability Engineer OCI Incident Response is the first line of defense in maintaining the high availability of Oracle's cloud. We minimize customer-impacting events by making them shorter, less frequent, and less impactful through large-scale incident...
Temporary work
Work experience placement
Flexible hours
Oracle
Santa Clara, CA
3 hours ago
Site Reliability Engineer
$170k - $200k
...Site Reliability Engineer We are seeking a talented and motivated Site Reliability Engineer to join... ..., maintaining, and troubleshooting cloud service/cluster, infrastructure, and... ...Fortinet empowers its customers with intelligent, seamless protection across the expanding...
Full time
Worldwide
Edelman
Sunnyvale, CA
1 day ago
Site Reliability Engineer
$145k - $175k
...Site Reliability Engineer (SRE) Bolt Graphics is a semiconductor startup based in Sunnyvale, CA building the fastest and most efficient graphics processors. We pride ourselves on our first principles approach to solving problems. We are energized by our mission to reduce...
Work at office
Immediate start
Work from home
Bolt Graphics
Sunnyvale, CA
4 hours ago
Site Reliability Engineer II
...threats across hybrid multi-cloud environments - stopping the spread... ...running. Location: 5 on-site days a week in Sunnyvale, CA... ...Our Team's Vision: Our Engineering team is driven by a culture that... ...work on enhancing system reliability and scalability of Illumio SaaS...
Work experience placement
Immediate start
Illumio
Sunnyvale, CA
4 days ago
Site Reliability Engineer, Enterprise Technology Services
...Site Reliability Engineer, Enterprise Technology Services At Apple, groundbreaking ideas quickly transform into extraordinary products and services... ..., build, and manage robust, distributed systems across cloud and on-premise infrastructure. Develop advanced capacity...
Worldwide
Relocation
Apple
Sunnyvale, CA
4 hours ago
Sr. Site Reliability Engineer
$128k - $216k
...Sr. Site Reliability Engineer Clover is a pioneer in the fintech space, dedicated to transforming... ...seamless payment processing to inventory and staff management. With over 15 billion... ...with the latest and greatest of what cloud technology at blistering scale has to...
Worldwide
Fiserv
Sunnyvale, CA
1 day ago
Senior Site Reliability Engineer
$150k - $175k
...Site Reliability Engineer At ASAPP, our mission is simple: deliver the best AI-powered customer experience—faster than anyone else. To achieve that, we're guided by principles that shape how we think, build, and execute. We value customer obsession, purposeful speed...
Remote work
ASAPP
Mountain View, CA
3 days ago
Senior Site Reliability Engineer
$181.69k - $213.75k
...Senior Site Reliability Engineer San Francisco, California; Santa Clara, California; Seattle, WA The Company You'll Join Carta connects... ...are Software and Infrastructure Engineers who specialize in cloud computing, networking, systems design and architecture,...
Full time
Work at office
Carta
Santa Clara, CA
2 days ago
Senior / Staff Site Reliability, Platform Engineering
...complex, distributed, cloud-native systems. As a Staff Platform Engineer, you will play a critical... ...role. You will own reliability for major platform domains... ...Platform Engineering, or Site Reliability Engineering... ...We may use artificial intelligence (AI) tools to support...
Intelligence
Saviynt
Milpitas, CA
23 days ago
Principal Site Reliability Engineer, Google Cloud
...As we scale globally, reliability, availability, and performance... .... As a Principal Engineer, you will define and... ...faster in a multi-cloud environment •... ...Platform Engineering, or Site Reliability Engineering... ...We may use artificial intelligence (AI) tools to support...
Intelligence
Saviynt
Milpitas, CA
2 days ago
Senior Staff Site Reliability Engineer
$126k - $204.5k
..., you will collaborate closely with our engineering teams to develop innovative solutions that... ...Utilize your expertise in monitoring cloud platforms, particularly GCP, to optimize... ...operability of the product and ensure the reliability and availability of our services....
Full time
Work at office
Palo Alto Networks
Santa Clara, CA
1 day ago
Staff Site Reliability Engineer
$252k - $308k
...Staff Site Reliability Engineer Mountain View, US About EarnIn As one of the first pioneers of earned wage access, our passion at EarnIn is... ...real human response. ~ Strong infrastructure-as-code and cloud infrastructure experience, including Terraform, Kubernetes...
Full time
Work at office
2 days per week
Earnin
Mountain View, CA
1 day ago
Lead Site Reliability Engineer
$200k - $260k
...full-scale Work AI ecosystem, powering intelligent Search, an AI Assistant, and scalable... ...the Role: Glean is seeking a Site Reliability Engineering Lead to foster a culture of engineering... ...production environments in the cloud. You'll lead a team and be responsible...
Work at office
Home office
Flexible hours
Glean.info
Mountain View, CA
4 days ago
Principal Site Reliability Engineer
$147k - $237.5k
...consisting of XDR, XSIAM, XSOAR, and XPANSE. As a Principal Site Reliability Engineer within the Cortex DevOps team, you will serve as a... ...efficiency at global scale. This role requires deep expertise in cloud infrastructure, observability, distributed systems,...
Full time
Work at office
Palo Alto Networks
Santa Clara, CA
1 day ago
Senior Site Reliability Engineer / DevOps Engineer
...South America, and Europe Role Overview Seeking a Senior Site Reliability Engineer / DevOps Engineer to design, scale, and operate highly available... ...years of experience building and running production‑grade cloud infrastructure. The right person understands where...
Prophet Town
Mountain View, CA
2 days ago
Site Reliability Engineer — Human Engineering
$147.4k - $272.1k
...accomplish. We are a team of software engineers developing web-based tools and native applications... ...every day. We’re looking for a Site Reliability Engineer who thinks like a systems... ...pipelines, and a move from on‑prem to cloud‑native infrastructure. We need someone...
Relocation
Shift work
Apple Inc.
Cupertino, CA
4 days ago
Site Reliability Engineer
Site Reliability Engineer Onsite- Bay Area, CA Skills Relevant Skills and Experience What You’ll Do (Day-to-Day) Own and manage our cloud infrastructure (GCP or AWS, on-prem). Build, maintain, and optimize Kubernetes clusters (including GPU-backed clusters). Implement...
Amiri Recruiting
Santa Clara, CA
1 day ago

Do you want to receive more vacancies?

Subscribe and receive similar vacancies to Staff Site Reliability Engineer, Cloud Reliability Intelligence. Be the first to apply!