Staff Site Reliability Engineer, Cloud Reliability Intelligence
$207k - $301kGoogle Inc.
Staff Site Reliability Engineer, Cloud Reliability Intelligence Google Sunnyvale, CA, USA Apply Qualifications Bachelor's degree in Computer Science or a related technical field or equivalent practical experience. 8 years of experience with data structures and algorithms. 3 years of experience leading projects and designing, analyzing, and troubleshooting distributed systems. 3 years of experience in a technical leadership role; overseeing projects. Experience overseeing full-stack architectures, ensuring cohesion between backend data automation layers and engineering frontend. Preferred Qualifications Experience in applying LLMs or Generative AI to automate workflows. Familiarity with large-scale reliability analysis, or policy conformance frameworks. About the job Site Reliability Engineering (SRE) combines software and systems engineering to build and run large-scale, massively distributed, fault‑tolerant systems. SRE ensures that Google Cloud's services—both our internally critical and our externally‑visible systems—have reliability, uptime appropriate to customer's needs and a fast rate of improvement. Additionally SRE’s will keep an ever‑watchful eye on our systems capacity and performance. Much of our software development focuses on optimizing existing systems, building infrastructure and eliminating work through automation. On the SRE team, you’ll have the opportunity to manage the complex challenges of scale which are unique to Google Cloud, while using your expertise in coding, algorithms, complexity analysis and large‑scale system design. SRE’s culture of intellectual curiosity, problem solving and openness is key to its success. Our organization brings together people with a wide variety of backgrounds, experiences and perspectives. We encourage them to collaborate, think big and take risks in a blame‑free environment. We promote self‑direction to work on meaningful projects, while we also strive to create an environment that provides the support and mentorship needed to learn and grow. The Reliability Outcome Enablement team develops the products, core infrastructure, and datasets that drive and sustain Google Cloud platform's (GCP's) reliability promises. We build the evergreen intelligence platform the core system that automates resilience across the GCP ecosystem. Every product team at Google (from BigQuery to Spanner) relies on our infrastructure and integrated data lake to keep their services bulletproof. We are currently expanding our platform to integrate Generative AI and LLM‑driven workflows, moving from reactive tracking to a predictive system that catches failures and automates risk mitigation. Behind everything our users see online is the architecture built by the Technical Infrastructure team to keep it running. From developing and maintaining our data centers to building the next generation of Google platforms, we make Google's product portfolio possible. We're proud to be our engineers' engineers and love voiding warranties by taking things apart so we can rebuild them. We keep our networks up and running, ensuring our users have the best and fastest experience possible. Individual pay is determined by factors including job‑related skills, experience, and relevant education or training. US: USD 207,000 – 301,000 + 20% bonus target + equity + benefits Learn more about benefits at Google. Own the technical roadmap and long‑term architecture for the Evergreen platform, including a unified data model for promise delivery across GCP. Design and scale high‑performance backend pipelines (Go, Java) and data‑rich user interfaces (TypeScript, Angular) used by over 10,000+ Google engineers. Prototype and productionize LLM‑based features to parse unstructured incident data, automatically file risk tickets, and suggest reliability fixes. Partner closely with Product Management, Data Science, and leadership to align multiple organizations on a unified approach to policy measurement and enforcement. Google is proud to be an equal opportunity and affirmative action employer. We are committed to building a workforce that is representative of the users we serve, creating a culture of belonging, and providing an equal employment opportunity regardless of race, creed, color, religion, gender, sexual orientation, gender identity/expression, national origin, disability, age, genetic information, veteran status, marital status, pregnancy or related condition (including breastfeeding), expecting or parents‑to‑be, criminal histories consistent with legal requirements, or any other basis protected by law. See also Google's EEO Policy, Know your rights: workplace discrimination is illegal, Belonging at Google, and How we hire. Google is a global company and, in order to facilitate efficient collaboration and communication globally, English proficiency is a requirement for all roles unless stated otherwise in the job posting. To all recruitment agencies: Google does not accept agency resumes. Please do not forward resumes to our jobs alias, Google employees, or any other organization location. Google is not responsible for any fees related to unsolicited resumes. Equity is granted exclusively and discretionarily by Alphabet Inc. on the basis of an agreement concluded between you and Alphabet Inc. Alphabet Inc. is your sole contractual partner with respect to equity grants. GSU grants are not guaranteed, are discretionary, are subject to approval by the Alphabet Inc. board of directors or its delegate, the terms of the relevant Alphabet Inc. stock plan, and your grant agreement. They have no impact on statutory payments. Current or past grants do not confer an acquired right. #J-18808-Ljbffr Google Inc.
- ...GPU-based hyperscale cloud inference services. This... ...and increasing intelligence via additional agentic... ...powered by the Wafer‑Scale Engine (WSE). This team will... ...deliver world‑class, ultra‑reliable inference... ...frontier labs. As a Staff SRE, you will lead the...IntelligenceShift work
$148k - $235.75k
...will be working as a Senior SRE Engineer. The position will be part of a... ...Processors, Deep Learning, Artificial Intelligence and Driverless Cars to cater to... .... Maintain uptime, reliability and readiness of on-prem engineering cloud spread across multiple data centers...IntelligenceRemote work$185k - $278k
...central to the Era of Pervasive Intelligence, from self-driving cars to... .... * Debugging OS and engineering issues within our provided Linux... ...Have: * Enhancing the reliability and performance of our engineering... ..., Large scale private cloud implementation as well as GPU...IntelligenceRemote work$152k - $241.5k
...developments in Artificial Intelligence, High-Performance Computing... ...globally distributed, multi‑cloud hybrid environment - On‑prem... ...lifecycle management, fleet reliability/auto‑healing, E2E observability... ..., or Ruby. Mentored other engineers and influenced technical direction...Intelligence$65 - $85 per hour
...for over 25 years. We are looking for a Site Reliability Engineer to support our client's team based out... ...Processors, Deep Learning, Artificial Intelligence and Driverless Cars to cater to their infrastructure needs. These cloud services provide almost half a million...IntelligenceFull timeContract workWorldwide- Google Inc. is seeking a Staff Site Reliability Engineer in Sunnyvale, CA, who will manage the technical roadmap for the Evergreen platform and optimize... ...teams and a focus on reliability and efficiency in Google Cloud services, leveraging large-scale architecture knowledge...Intelligence
$170k - $230k
...Site Reliability Engineer (SRE) Palo Alto / San Francisco Bay Area About Mithril Mithril is... ...Most Innovative Company in Artificial Intelligence for 2026. Our engineering team is... ...platform across a heterogeneous, multi-cloud environment. About the Opportunity...IntelligenceWork at officeLocal area1 day per week$175k - $250k
...Staff Site Reliability Engineer Figure is an AI robotics company developing autonomous general-purpose... ...humanoid robots with human level intelligence. Its robots are engineered to perform... ...responsible for setting up and managing cloud and on-prem infrastructure to...IntelligenceFull time- ...Machine Learning (ML) and Artificial Intelligence (AI) technologies. Our mission is to... ...data centers. We are seeking a skilled Site Reliability Engineer to join our growing team. The ideal... ...and performance of our hybrid-based (Cloud & On-Prem) platform while supporting...IntelligenceWork at officeWeekend work
- A leading technology company is seeking a Site Reliability Engineer in Cupertino, California. The role involves owning the reliability of AWS and Kubernetes services, designing systems, and collaborating with engineering teams for observability and automation. Candidates...
$176k - $276k
Site Reliability Engineering (SRE) at NVIDIA is an engineering discipline to design, build and maintain large scale production systems with high... ...management, continuous delivery and deployment and open source cloud enabling technologies like Kubernetes and OpenStack. SRE...- A leading tech recruiting firm is seeking a Site Reliability Engineer to manage and optimize cloud infrastructure primarily using GCP or AWS. The role involves maintaining high availability through Kubernetes clusters and improving CI/CD pipelines with Terraform. Ideal...
$120.3k - $194.53k
...that drives great outcomes. Job Summary Palo Alto Networks runs a large hybrid infrastructure across multiple public clouds. As a Site Reliability Engineer on the Internet Security Platform team, you will be part of a team supporting Advanced DNS Security services. This...Full timeWork at officeVisa sponsorshipWork visa$83k - $187k
...Senior Site Reliability Engineer OCI Incident Response is the first line of defense in maintaining the high availability of Oracle's cloud. We minimize customer-impacting events by making them shorter, less frequent, and less impactful through large-scale incident...Temporary workWork experience placementFlexible hours$170k - $200k
...Site Reliability Engineer We are seeking a talented and motivated Site Reliability Engineer to join... ..., maintaining, and troubleshooting cloud service/cluster, infrastructure, and... ...Fortinet empowers its customers with intelligent, seamless protection across the expanding...Full timeWorldwide$145k - $175k
...Site Reliability Engineer (SRE) Bolt Graphics is a semiconductor startup based in Sunnyvale, CA building the fastest and most efficient graphics processors. We pride ourselves on our first principles approach to solving problems. We are energized by our mission to reduce...Work at officeImmediate startWork from home- ...threats across hybrid multi-cloud environments - stopping the spread... ...running. Location: 5 on-site days a week in Sunnyvale, CA... ...Our Team's Vision: Our Engineering team is driven by a culture that... ...work on enhancing system reliability and scalability of Illumio SaaS...Work experience placementImmediate start
- ...Site Reliability Engineer, Enterprise Technology Services At Apple, groundbreaking ideas quickly transform into extraordinary products and services... ..., build, and manage robust, distributed systems across cloud and on-premise infrastructure. Develop advanced capacity...WorldwideRelocation
$128k - $216k
...Sr. Site Reliability Engineer Clover is a pioneer in the fintech space, dedicated to transforming... ...seamless payment processing to inventory and staff management. With over 15 billion... ...with the latest and greatest of what cloud technology at blistering scale has to...Worldwide$150k - $175k
...Site Reliability Engineer At ASAPP, our mission is simple: deliver the best AI-powered customer experience—faster than anyone else. To achieve that, we're guided by principles that shape how we think, build, and execute. We value customer obsession, purposeful speed...Remote work$181.69k - $213.75k
...Senior Site Reliability Engineer San Francisco, California; Santa Clara, California; Seattle, WA The Company You'll Join Carta connects... ...are Software and Infrastructure Engineers who specialize in cloud computing, networking, systems design and architecture,...Full timeWork at office- ...complex, distributed, cloud-native systems. As a Staff Platform Engineer, you will play a critical... ...role. You will own reliability for major platform domains... ...Platform Engineering, or Site Reliability Engineering... ...We may use artificial intelligence (AI) tools to support...Intelligence
- ...As we scale globally, reliability, availability, and performance... .... As a Principal Engineer, you will define and... ...faster in a multi-cloud environment •... ...Platform Engineering, or Site Reliability Engineering... ...We may use artificial intelligence (AI) tools to support...Intelligence
$126k - $204.5k
..., you will collaborate closely with our engineering teams to develop innovative solutions that... ...Utilize your expertise in monitoring cloud platforms, particularly GCP, to optimize... ...operability of the product and ensure the reliability and availability of our services....Full timeWork at office$252k - $308k
...Staff Site Reliability Engineer Mountain View, US About EarnIn As one of the first pioneers of earned wage access, our passion at EarnIn is... ...real human response. ~ Strong infrastructure-as-code and cloud infrastructure experience, including Terraform, Kubernetes...Full timeWork at office2 days per week$200k - $260k
...full-scale Work AI ecosystem, powering intelligent Search, an AI Assistant, and scalable... ...the Role: Glean is seeking a Site Reliability Engineering Lead to foster a culture of engineering... ...production environments in the cloud. You'll lead a team and be responsible...Work at officeHome officeFlexible hours$147k - $237.5k
...consisting of XDR, XSIAM, XSOAR, and XPANSE. As a Principal Site Reliability Engineer within the Cortex DevOps team, you will serve as a... ...efficiency at global scale. This role requires deep expertise in cloud infrastructure, observability, distributed systems,...Full timeWork at office- ...South America, and Europe Role Overview Seeking a Senior Site Reliability Engineer / DevOps Engineer to design, scale, and operate highly available... ...years of experience building and running production‑grade cloud infrastructure. The right person understands where...
$147.4k - $272.1k
...accomplish. We are a team of software engineers developing web-based tools and native applications... ...every day. We’re looking for a Site Reliability Engineer who thinks like a systems... ...pipelines, and a move from on‑prem to cloud‑native infrastructure. We need someone...RelocationShift work- Site Reliability Engineer Onsite- Bay Area, CA Skills Relevant Skills and Experience What You’ll Do (Day-to-Day) Own and manage our cloud infrastructure (GCP or AWS, on-prem). Build, maintain, and optimize Kubernetes clusters (including GPU-backed clusters). Implement...
Do you want to receive more vacancies?
Subscribe and receive similar vacancies to Staff Site Reliability Engineer, Cloud Reliability Intelligence. Be the first to apply!
- engineering aide Sunnyvale, CA
- senior staff systems engineer Sunnyvale, CA
- staff design engineer Sunnyvale, CA
- staff engineer Sunnyvale, CA
- technology administrator Sunnyvale, CA
- senior staff engineer Sunnyvale, CA
- assistant engineer Sunnyvale, CA
- software engineer staff Sunnyvale, CA
- site reliability engineer sre Sunnyvale, CA
- site reliability engineer Sunnyvale, CA

