Sign up to access all features of our service.
  • Job search
  • Favorites
  • Create a CV
    New
  • Salaries
  • Subscriptions

Staff Site Reliability Engineer, Cloud Reliability Intelligence

$207k - $301k

Google Inc.

Staff Site Reliability Engineer, Cloud Reliability Intelligence Google Sunnyvale, CA, USA Apply Qualifications Bachelor's degree in Computer Science or a related technical field or equivalent practical experience. 8 years of experience with data structures and algorithms. 3 years of experience leading projects and designing, analyzing, and troubleshooting distributed systems. 3 years of experience in a technical leadership role; overseeing projects. Experience overseeing full-stack architectures, ensuring cohesion between backend data automation layers and engineering frontend. Preferred Qualifications Experience in applying LLMs or Generative AI to automate workflows. Familiarity with large-scale reliability analysis, or policy conformance frameworks. About the job Site Reliability Engineering (SRE) combines software and systems engineering to build and run large-scale, massively distributed, fault‑tolerant systems. SRE ensures that Google Cloud's services—both our internally critical and our externally‑visible systems—have reliability, uptime appropriate to customer's needs and a fast rate of improvement. Additionally SRE’s will keep an ever‑watchful eye on our systems capacity and performance. Much of our software development focuses on optimizing existing systems, building infrastructure and eliminating work through automation. On the SRE team, you’ll have the opportunity to manage the complex challenges of scale which are unique to Google Cloud, while using your expertise in coding, algorithms, complexity analysis and large‑scale system design. SRE’s culture of intellectual curiosity, problem solving and openness is key to its success. Our organization brings together people with a wide variety of backgrounds, experiences and perspectives. We encourage them to collaborate, think big and take risks in a blame‑free environment. We promote self‑direction to work on meaningful projects, while we also strive to create an environment that provides the support and mentorship needed to learn and grow. The Reliability Outcome Enablement team develops the products, core infrastructure, and datasets that drive and sustain Google Cloud platform's (GCP's) reliability promises. We build the evergreen intelligence platform the core system that automates resilience across the GCP ecosystem. Every product team at Google (from BigQuery to Spanner) relies on our infrastructure and integrated data lake to keep their services bulletproof. We are currently expanding our platform to integrate Generative AI and LLM‑driven workflows, moving from reactive tracking to a predictive system that catches failures and automates risk mitigation. Behind everything our users see online is the architecture built by the Technical Infrastructure team to keep it running. From developing and maintaining our data centers to building the next generation of Google platforms, we make Google's product portfolio possible. We're proud to be our engineers' engineers and love voiding warranties by taking things apart so we can rebuild them. We keep our networks up and running, ensuring our users have the best and fastest experience possible. Individual pay is determined by factors including job‑related skills, experience, and relevant education or training. US: USD 207,000 – 301,000 + 20% bonus target + equity + benefits Learn more about benefits at Google. Own the technical roadmap and long‑term architecture for the Evergreen platform, including a unified data model for promise delivery across GCP. Design and scale high‑performance backend pipelines (Go, Java) and data‑rich user interfaces (TypeScript, Angular) used by over 10,000+ Google engineers. Prototype and productionize LLM‑based features to parse unstructured incident data, automatically file risk tickets, and suggest reliability fixes. Partner closely with Product Management, Data Science, and leadership to align multiple organizations on a unified approach to policy measurement and enforcement. Google is proud to be an equal opportunity and affirmative action employer. We are committed to building a workforce that is representative of the users we serve, creating a culture of belonging, and providing an equal employment opportunity regardless of race, creed, color, religion, gender, sexual orientation, gender identity/expression, national origin, disability, age, genetic information, veteran status, marital status, pregnancy or related condition (including breastfeeding), expecting or parents‑to‑be, criminal histories consistent with legal requirements, or any other basis protected by law. See also Google's EEO Policy, Know your rights: workplace discrimination is illegal, Belonging at Google, and How we hire. Google is a global company and, in order to facilitate efficient collaboration and communication globally, English proficiency is a requirement for all roles unless stated otherwise in the job posting. To all recruitment agencies: Google does not accept agency resumes. Please do not forward resumes to our jobs alias, Google employees, or any other organization location. Google is not responsible for any fees related to unsolicited resumes. Equity is granted exclusively and discretionarily by Alphabet Inc. on the basis of an agreement concluded between you and Alphabet Inc. Alphabet Inc. is your sole contractual partner with respect to equity grants. GSU grants are not guaranteed, are discretionary, are subject to approval by the Alphabet Inc. board of directors or its delegate, the terms of the relevant Alphabet Inc. stock plan, and your grant agreement. They have no impact on statutory payments. Current or past grants do not confer an acquired right. #J-18808-Ljbffr Google Inc.

Vacancy posted 3 days ago
Similar jobs that could be interesting for youBased on the Staff Site Reliability Engineer, Cloud Reliability Intelligence in Sunnyvale, CA vacancy
  •  ...GPU-based hyperscale cloud inference services. This...  ...and increasing intelligence via additional agentic...  ...powered by the Wafer‑Scale Engine (WSE). This team will...  ...deliver world‑class, ultra‑reliable inference...  ...frontier labs. As a Staff SRE, you will lead the... 
    Intelligence
    Shift work

    Cerebras Systems, Inc.

    Sunnyvale, CA
    1 day ago
  • $148k - $235.75k

     ...will be working as a Senior SRE Engineer. The position will be part of a...  ...Processors, Deep Learning, Artificial Intelligence and Driverless Cars to cater to...  .... Maintain uptime, reliability and readiness of on-prem engineering cloud spread across multiple data centers... 
    Intelligence
    Remote work

    NVIDIA

    Santa Clara, CA
    3 days ago
  • $185k - $278k

     ...central to the Era of Pervasive Intelligence, from self-driving cars to...  .... * Debugging OS and engineering issues within our provided Linux...  ...Have: * Enhancing the reliability and performance of our engineering...  ..., Large scale private cloud implementation as well as GPU... 
    Intelligence
    Remote work

    Synopsys

    Sunnyvale, CA
    2 days ago
  • $152k - $241.5k

     ...developments in Artificial Intelligence, High-Performance Computing...  ...globally distributed, multi‑cloud hybrid environment - On‑prem...  ...lifecycle management, fleet reliability/auto‑healing, E2E observability...  ..., or Ruby. Mentored other engineers and influenced technical direction... 
    Intelligence

    NVIDIA Gruppe

    Santa Clara, CA
    3 days ago
  • $65 - $85 per hour

     ...for over 25 years. We are looking for a Site Reliability Engineer to support our client's team based out...  ...Processors, Deep Learning, Artificial Intelligence and Driverless Cars to cater to their infrastructure needs. These cloud services provide almost half a million... 
    Intelligence
    Full time
    Contract work
    Worldwide

    Sustainable Talent

    Santa Clara, CA
    15 hours ago
  • Google Inc. is seeking a Staff Site Reliability Engineer in Sunnyvale, CA, who will manage the technical roadmap for the Evergreen platform and optimize...  ...teams and a focus on reliability and efficiency in Google Cloud services, leveraging large-scale architecture knowledge... 
    Intelligence

    Google Inc.

    Sunnyvale, CA
    4 days ago
  • $170k - $230k

     ...Site Reliability Engineer (SRE) Palo Alto / San Francisco Bay Area About Mithril Mithril is...  ...Most Innovative Company in Artificial Intelligence for 2026. Our engineering team is...  ...platform across a heterogeneous, multi-cloud environment. About the Opportunity... 
    Intelligence
    Work at office
    Local area
    1 day per week

    Mithril

    Palo Alto, CA
    4 days ago
  • $175k - $250k

     ...Staff Site Reliability Engineer Figure is an AI robotics company developing autonomous general-purpose...  ...humanoid robots with human level intelligence. Its robots are engineered to perform...  ...responsible for setting up and managing cloud and on-prem infrastructure to... 
    Intelligence
    Full time

    Figure

    San Jose, CA
    2 days ago
  •  ...Machine Learning (ML) and Artificial Intelligence (AI) technologies. Our mission is to...  ...data centers. We are seeking a skilled Site Reliability Engineer to join our growing team. The ideal...  ...and performance of our hybrid-based (Cloud & On-Prem) platform while supporting... 
    Intelligence
    Work at office
    Weekend work

    FLUIX

    Palo Alto, CA
    5 days ago
  • A leading technology company is seeking a Site Reliability Engineer in Cupertino, California. The role involves owning the reliability of AWS and Kubernetes services, designing systems, and collaborating with engineering teams for observability and automation. Candidates... 

    Apple Inc.

    Cupertino, CA
    5 days ago
  • $176k - $276k

    Site Reliability Engineering (SRE) at NVIDIA is an engineering discipline to design, build and maintain large scale production systems with high...  ...management, continuous delivery and deployment and open source cloud enabling technologies like Kubernetes and OpenStack. SRE... 

    NVIDIA Corporation

    Santa Clara, CA
    1 day ago
  • A leading tech recruiting firm is seeking a Site Reliability Engineer to manage and optimize cloud infrastructure primarily using GCP or AWS. The role involves maintaining high availability through Kubernetes clusters and improving CI/CD pipelines with Terraform. Ideal... 

    Amiri Recruiting

    Mountain View, CA
    5 days ago
  • $120.3k - $194.53k

     ...that drives great outcomes. Job Summary Palo Alto Networks runs a large hybrid infrastructure across multiple public clouds. As a Site Reliability Engineer on the Internet Security Platform team, you will be part of a team supporting Advanced DNS Security services. This... 
    Full time
    Work at office
    Visa sponsorship
    Work visa

    Palo Alto Networks, Inc.

    Santa Clara, CA
    1 day ago
  • $83k - $187k

     ...Senior Site Reliability Engineer OCI Incident Response is the first line of defense in maintaining the high availability of Oracle's cloud. We minimize customer-impacting events by making them shorter, less frequent, and less impactful through large-scale incident... 
    Temporary work
    Work experience placement
    Flexible hours

    Oracle

    Santa Clara, CA
    3 hours ago
  • $170k - $200k

     ...Site Reliability Engineer We are seeking a talented and motivated Site Reliability Engineer to join...  ..., maintaining, and troubleshooting cloud service/cluster, infrastructure, and...  ...Fortinet empowers its customers with intelligent, seamless protection across the expanding... 
    Full time
    Worldwide

    Edelman

    Sunnyvale, CA
    1 day ago
  • $145k - $175k

     ...Site Reliability Engineer (SRE) Bolt Graphics is a semiconductor startup based in Sunnyvale, CA building the fastest and most efficient graphics processors. We pride ourselves on our first principles approach to solving problems. We are energized by our mission to reduce... 
    Work at office
    Immediate start
    Work from home

    Bolt Graphics

    Sunnyvale, CA
    4 hours ago
  •  ...threats across hybrid multi-cloud environments - stopping the spread...  ...running. Location: 5 on-site days a week in Sunnyvale, CA...  ...Our Team's Vision: Our Engineering team is driven by a culture that...  ...work on enhancing system reliability and scalability of Illumio SaaS... 
    Work experience placement
    Immediate start

    Illumio

    Sunnyvale, CA
    4 days ago
  •  ...Site Reliability Engineer, Enterprise Technology Services At Apple, groundbreaking ideas quickly transform into extraordinary products and services...  ..., build, and manage robust, distributed systems across cloud and on-premise infrastructure. Develop advanced capacity... 
    Worldwide
    Relocation

    Apple

    Sunnyvale, CA
    4 hours ago
  • $128k - $216k

     ...Sr. Site Reliability Engineer Clover is a pioneer in the fintech space, dedicated to transforming...  ...seamless payment processing to inventory and staff management. With over 15 billion...  ...with the latest and greatest of what cloud technology at blistering scale has to... 
    Worldwide

    Fiserv

    Sunnyvale, CA
    1 day ago
  • $150k - $175k

     ...Site Reliability Engineer At ASAPP, our mission is simple: deliver the best AI-powered customer experience—faster than anyone else. To achieve that, we're guided by principles that shape how we think, build, and execute. We value customer obsession, purposeful speed... 
    Remote work

    ASAPP

    Mountain View, CA
    3 days ago
  • $181.69k - $213.75k

     ...Senior Site Reliability Engineer San Francisco, California; Santa Clara, California; Seattle, WA The Company You'll Join Carta connects...  ...are Software and Infrastructure Engineers who specialize in cloud computing, networking, systems design and architecture,... 
    Full time
    Work at office

    Carta

    Santa Clara, CA
    2 days ago
  •  ...complex, distributed, cloud-native systems. As a Staff Platform Engineer, you will play a critical...  ...role. You will own reliability for major platform domains...  ...Platform Engineering, or Site Reliability Engineering...  ...We may use artificial intelligence (AI) tools to support... 
    Intelligence

    Saviynt

    Milpitas, CA
    23 days ago
  •  ...As we scale globally, reliability, availability, and performance...  .... As a Principal  Engineer, you will define and...  ...faster in a multi-cloud environment •...  ...Platform Engineering, or Site Reliability Engineering...  ...We may use artificial intelligence (AI) tools to support... 
    Intelligence

    Saviynt

    Milpitas, CA
    2 days ago
  • $126k - $204.5k

     ..., you will collaborate closely with our engineering teams to develop innovative solutions that...  ...Utilize your expertise in monitoring cloud platforms, particularly GCP, to optimize...  ...operability of the product and ensure the reliability and availability of our services.... 
    Full time
    Work at office

    Palo Alto Networks

    Santa Clara, CA
    1 day ago
  • $252k - $308k

     ...Staff Site Reliability Engineer Mountain View, US About EarnIn As one of the first pioneers of earned wage access, our passion at EarnIn is...  ...real human response. ~ Strong infrastructure-as-code and cloud infrastructure experience, including Terraform, Kubernetes... 
    Full time
    Work at office
    2 days per week

    Earnin

    Mountain View, CA
    1 day ago
  • $200k - $260k

     ...full-scale Work AI ecosystem, powering intelligent Search, an AI Assistant, and scalable...  ...the Role: Glean is seeking a Site Reliability Engineering Lead to foster a culture of engineering...  ...production environments in the cloud. You'll lead a team and be responsible... 
    Work at office
    Home office
    Flexible hours

    Glean.info

    Mountain View, CA
    4 days ago
  • $147k - $237.5k

     ...consisting of XDR, XSIAM, XSOAR, and XPANSE. As a Principal Site Reliability Engineer within the Cortex DevOps team, you will serve as a...  ...efficiency at global scale. This role requires deep expertise in cloud infrastructure, observability, distributed systems,... 
    Full time
    Work at office

    Palo Alto Networks

    Santa Clara, CA
    1 day ago
  •  ...South America, and Europe Role Overview Seeking a Senior Site Reliability Engineer / DevOps Engineer to design, scale, and operate highly available...  ...years of experience building and running production‑grade cloud infrastructure. The right person understands where... 

    Prophet Town

    Mountain View, CA
    2 days ago
  • $147.4k - $272.1k

     ...accomplish. We are a team of software engineers developing web-based tools and native applications...  ...every day. We’re looking for a Site Reliability Engineer who thinks like a systems...  ...pipelines, and a move from on‑prem to cloud‑native infrastructure. We need someone... 
    Relocation
    Shift work

    Apple Inc.

    Cupertino, CA
    4 days ago
  • Site Reliability Engineer Onsite- Bay Area, CA Skills Relevant Skills and Experience What You’ll Do (Day-to-Day) Own and manage our cloud infrastructure (GCP or AWS, on-prem). Build, maintain, and optimize Kubernetes clusters (including GPU-backed clusters). Implement... 

    Amiri Recruiting

    Santa Clara, CA
    1 day ago

Do you want to receive more vacancies?

Subscribe and receive similar vacancies to Staff Site Reliability Engineer, Cloud Reliability Intelligence. Be the first to apply!