Sign up to access all features of our service.
  • Job search
  • Favorites
  • Create a CV
    New
  • Salaries
  • Subscriptions

Lead Site Reliability Engineer

$200k - $260k

Softbank Investment Advisers

Lead Site Reliability Engineer

Glean is the Work AI platform that helps everyone work smarter with AI. What began as the industry's most advanced enterprise search has evolved into a full-scale Work AI ecosystem, powering intelligent Search, an AI Assistant, and scalable AI agents on one secure, open platform. With over 100 enterprise SaaS connectors, flexible LLM choice, and robust APIs, Glean gives organizations the infrastructure to govern, scale, and customize AI across their entire business - without vendor lock-in or costly implementation cycles. At its core, Glean is redefining how enterprises find, use, and act on knowledge. Its Enterprise Graph and Personal Knowledge Graph map the relationships between people, content, and activity, delivering deeply personalized, context-aware responses for every employee. This foundation powers Glean's agentic capabilities - AI agents that automate real work across teams by accessing the industry's broadest range of data: enterprise and world, structured and unstructured, historical and real-time. The result: measurable business impact through faster onboarding, hours of productivity gained each week, and smarter, safer decisions at every level. Recognized by Fast Company as one of the World's Most Innovative Companies (Top 10, 2025), by CNBC's Disruptor 50, Bloomberg's AI Startups to Watch (2026), Forbes AI 50, and Gartner's Tech Innovators in Agentic AI, Glean continues to accelerate its global impact. With customers across 50+ industries and 1,000+ employees in more than 25 countries, we're helping the world's largest organizations make every employee AI-fluent, and turning the superintelligent enterprise from concept into reality. If you're excited to shape how the world works, you'll help build systems used daily across Microsoft Teams, Zoom, ServiceNow, Zendesk, GitHub, and many more - deeply embedded where people get things done. You'll ship agentic capabilities on an open, extensible stack, with the craft and care required for enterprise trust, as we bring Work AI to every employee, in every company.

About the Role:

Glean is seeking a Site Reliability Engineering Lead to foster a culture of engineering excellence, drive technical strategy, and develop a high-performing, collaborative team. Your role is pivotal in ensuring our services meet stringent Service Level Objectives (SLOs) and in building resilient, automated production environments in the cloud. You'll lead a team and be responsible for products globally, providing technical leadership to key projects and empowering your team to do the same.

Much of our software development focuses on building infrastructure to scale our operations in a hybrid cloud environment and eliminating work through automation. On the SRE team, you'll have the opportunity to manage the complex challenges of scale and fast growth which are unique to Glean, while using your expertise in coding, algorithms, problem-solving, and SRE practices. We keep Glean applications up and running, ensuring our customers have the best and most reliable experience possible.

You are:

  • Technical Leadership and Mentorship: Play a key role in driving technical excellence and fostering a culture of reliability across engineering teams. You will lead by example, setting best practices for incident management, performance optimization, and automation. Influence best practices, drive cross-team collaborations, and contribute to the execution of key objectives in alignment with engineering leadership and cross-functional partners. Establish strong technical credibility, shaping architectural decisions and ensuring the delivery of high-quality, reliable systems.
  • Ensure High Availability: Implement and maintain resilient cloud architectures, monitor system performance, and proactively identify and resolve potential bottlenecks or points of failure.
  • Incident Management: Participate in primary oncall rotation; cultivate technical curiosity and growth mindset, and a blameless postmortem culture within the team. Continuously optimize the on-call process for sustainability and efficiency.
  • Automation and Tooling: Develop and maintain automation scripts, tools, and processes to streamline system deployment, monitoring, and management tasks. Your contributions will be vital in efficiently scaling cloud operations.
  • Performance Optimization: Optimize cloud infrastructure and applications for performance, scalability, and cost-effectiveness.
  • Security and Compliance: Collaborate with security engineers to implement best practices and ensure compliance with security standards and policies.
  • Monitoring and Alerting: Design and configure advanced monitoring systems to gain insights into system behavior, set up alerts, and respond proactively to potential issues. Create and maintain comprehensive dashboards and playbooks for production on-call.
  • Software Development Consultation: Engage actively in the entire software development lifecycle. Participate in system design reviews and provide valuable SRE insights during launch reviews, influencing and enhancing system architecture.

About you:

  • Bachelor's degree in Computer Science, a related field, or equivalent practical experience.
  • 8+ years of experience in a senior-level role within Site Reliability Engineering or similar role, particularly in managing cloud-based services and infrastructure.
  • 5+ years of experience with software development in one or more programming languages.
  • 3+ years of experience managing people or teams, leading projects, and designing, analyzing, and troubleshooting distributed systems running in Cloud.
  • Strong knowledge of cloud platforms such as Google Cloud Platform, AWS, or Azure.
  • Practical experience with containerization technologies, including Docker and Kubernetes. Familiarity with infrastructure as code tools like Terraform is essential.
  • Solid understanding of networking, security principles, and best SRE and security practices.
  • Proficiency in using monitoring and alerting tools to detect and respond to potential issues effectively

Location:

  • This role is hybrid (4 days a week in our Mountain View Office)

Compensation & Benefits:

The standard base salary range for this position is $200,000 - $260,000 annually. Compensation offered will be determined by factors such as location, level, job-related knowledge, skills, and experience. Certain roles may be eligible for variable compensation, equity, and benefits.

We offer a comprehensive benefits package including competitive compensation, Medical, Vision, and Dental coverage, generous time-off policy, and the opportunity to contribute to your 401k plan to support your long-term goals. When you join, you'll receive a home office improvement stipend, as well as an annual education and wellness stipends to support your growth and wellbeing. We foster a vibrant company culture through regular events, and provide healthy lunches daily to keep you fueled and focused.

We are a diverse bunch of people and we want to continue to attract and retain a diverse range of people into our organization. We're committed to an inclusive and diverse company. We do not discriminate based on gender, ethnicity, sexual orientation, religion, civil or family status, age, disability, or race.

Vacancy posted 12 hours ago
Similar jobs that could be interesting for youBased on the Lead Site Reliability Engineer in Mountain View, CA vacancy
  • $250k

     ...single source of truth—explainable, reliable, and maintainable—that serves as the...  ...primary point of ROI. Our solutions power leading companies including JP Morgan,...  ...Position Overview As Director of Site Reliability Engineering, you will ensure that eGain’s AI knowledge... 
    Suggested
    Work at office

    eGain Corporation

    Sunnyvale, CA
    1 day ago
  • $207k - $300k

    A leading technology company in Sunnyvale, CA is seeking a Software Engineering Manager II for Site Reliability Engineering. You'll lead a team to ensure uptime and optimize the availability, scalability, and performance of key services. With a focus on automation and system... 
    Suggested

    Google Inc.

    Sunnyvale, CA
    2 days ago
  •  ...Location: Sunnyvale, CA (3x/ week onsite) Duration: 6 months SRE - Site Reliability Engineer Responsibilities: Engage with our product teams to understand requirements, design and implement resilient and scalable infrastructure solutions.... 
    Suggested

    Diverse Lynx

    Sunnyvale, CA
    1 day ago
  •  ...world running. Location: 5 on-site days a week in Sunnyvale, CA...  ...Our Team's Vision: Our Engineering team is shaping the future of...  ...an experienced Senior Site Reliability Engineer (SRE) with a strong...  ...and infrastructure updates Lead incident response and resolution... 
    Suggested
    Work experience placement
    Immediate start

    Illumio

    Sunnyvale, CA
    2 days ago
  •  ...Senior Site Reliability Engineer Location: Remote Duration: 12 month contract to start IV Process: 1-3 Round IV process International...  ..., or service operations and quality • Participate in, or lead design reviews with peers and stakeholders to decide... 
    Suggested
    Contract work
    Local area
    Remote work

    My3Tech Inc

    Sunnyvale, CA
    4 days ago
  •  ...Overview: *Must have Apple experience* • At least 8+ years in a Reliability Engineering, DevOps or infrastructure focused role • Advanced experience with programming languages (Python, Java) • Passion for designing and building reliable systems • Strong sense... 

    Purple Drive

    Sunnyvale, CA
    5 days ago
  •  ...The Role We're looking for a Senior Site Reliability Engineer to own the reliability, scalability, and operational excellence of the production...  ..., alerting, and on-call practices to support them Lead incident response and blameless postmortems, then turn what... 

    XRC Ventures

    Palo Alto, CA
    2 days ago
  • $147.4k - $272.1k

     ...Site Reliability Engineer, Enterprise Technology Services Imagine what we could do together. At Apple, new ideas have a way of becoming excellent...  ...everything we do, from amazing technology to industry-leading environmental efforts. At ETS Team, we take pride in developing... 
    Relocation

    Apple

    Sunnyvale, CA
    3 days ago
  •  ...keep the world running. Location: 5 on-site days a week in Sunnyvale, CA Headquarters. Our Team's Vision: Our Engineering team is driven by a culture that thrives...  ...basis, you will work on enhancing system reliability and scalability of Illumio SaaS products,... 
    Work experience placement
    Immediate start

    Illumio

    Sunnyvale, CA
    2 days ago
  • $170k - $230k

     ...Site Reliability Engineer (SRE) Palo Alto / San Francisco Bay Area About Mithril Mithril is an AI infrastructure platform built to make...  ...GPU compute more accessible and affordable for the world's leading enterprises, AI startups, and the AI research community, including... 
    Work at office
    Local area
    1 day per week

    Mithril

    Palo Alto, CA
    5 days ago
  • $150k - $175k

     ...Site Reliability Engineer At ASAPP, our mission is simple: deliver the best AI-powered customer experience—faster than anyone else. To achieve that, we're guided by principles that shape how we think, build, and execute. We value customer obsession, purposeful speed... 
    Remote work

    ASAPP

    Mountain View, CA
    1 day ago
  • Job Description : Need to have experience with ticket support, azure, Splunk, ServiceNow, and any Java experience is a plus. Ideally candidates that come from an Enterprise background Handling tickets for the Walmart environment. Splunk, Servicenow...

    3B Staffing LLC

    Sunnyvale, CA
    5 days ago
  • $145k - $175k

     ...Site Reliability Engineer (SRE) Bolt Graphics is a semiconductor startup based in Sunnyvale, CA building the fastest and most efficient graphics...  ...risks. Participate in an on-call rotation and lead incident response efforts, including rapid triage, mitigation... 
    Work at office
    Immediate start
    Work from home

    Bolt Graphics

    Sunnyvale, CA
    3 days ago
  • $98.58k - $138.02k

     ...Site Reliability Engineer II Restaurant365 is a SaaS company disrupting the restaurant industry! Our cloud-based platform provides a unique, centralized solution for accounting and back-office operations for restaurants. Restaurant365's culture is focused on empowering... 
    Work at office

    Restaurant365

    Palo Alto, CA
    4 days ago
  • $170k - $200k

     ...Site Reliability Engineer We are seeking a talented and motivated Site Reliability Engineer to join our engineering team. You will be responsible for building, maintaining, and troubleshooting cloud service/cluster, infrastructure, and monitoring systems to ensure high... 
    Full time
    Worldwide

    Edelman

    Sunnyvale, CA
    4 days ago
  •  ...Senior Site Reliability Engineer Latitude AI develops automated driving technologies, including L3, for Ford vehicles at scale. We're driven...  ...When you join the Latitude team, you'll work alongside leading experts across machine learning and robotics, cloud platforms... 
    Work at office
    Immediate start

    Latitude AI

    Palo Alto, CA
    5 days ago
  • $217.57k - $260k

     ...identity. To learn more, visit Role Overview The Staff Site Reliability Engineer, Infrastructure role is building a high-scale...  ...Candidates must have experience operating at this scale and leading infrastructure through significant transformation-especially... 
    Full time
    Temporary work
    Work at office
    Remote work
    Flexible hours
    Shift work

    ID.me

    Mountain View, CA
    5 days ago
  • $175k - $250k

     ...Staff Site Reliability Engineer Figure is an AI robotics company developing autonomous general-purpose humanoid robots. The goal of the company is to ship humanoid robots with human level intelligence. Its robots are engineered to perform a variety of tasks in the home... 
    Full time

    Figure

    Sunnyvale, CA
    5 days ago
  •  ...Director of Site Reliability Engineering You have discovered the perfect setting to expand your skills and make a meaningful impact. Partner...  ...proactively anticipating team needs and removing road blockers Leads reuse-first adoption of enterprise-authorized AI... 

    Chase

    Palo Alto, CA
    1 day ago
  • $180k - $260k

     ...effortless integration into customers' logistics operations. About the role We are seeking an experienced Senior/Staff Site Reliability Engineer to support the operation, monitoring, and scaling of our growing fleet of autonomous vehicles. In this role, you will work... 
    Odd job
    Work at office
    Remote work

    Gatik AI

    Mountain View, CA
    5 days ago
  •  ...Site Reliability Engineering As a Site Reliability Engineering at JPMorgan Chase within the Enterprise technology, liquidity risk team, you are...  ...situations with composure and tact. Job Responsibilities Lead SRE practices that balance delivery speed, efficiency, and... 

    Chase

    Palo Alto, CA
    13 hours ago
  • $252k - $308k

     ...Staff Site Reliability Engineer Mountain View, US About EarnIn As one of the first pioneers of earned wage access, our passion at EarnIn...  ...without increasing operational risk. This role exists to lead EarnIn's next stage of reliability maturity: an AI-first operating... 
    Full time
    Work at office
    2 days per week

    Earnin

    Mountain View, CA
    4 days ago
  •  ...Number: 200663929-3956 Summary We are seeking a proactive Site Reliability Engineer to champion the evolution of our production ecosystems. In...  ...workloads via Kubernetes. A track record of leading "blameless post-mortems" and using those insights to harden... 
    Work experience placement
    Shift work

    Apple

    Sunnyvale, CA
    5 days ago
  •  ...world running. Location: 5 on-site days a week in Sunnyvale, CA...  .... Our Team's Vision: Our Engineering team is shaping the future of...  ...for an experienced Senior Site Reliability Engineer (SRE) with a strong...  ...and infrastructure updates Lead incident response and resolution... 
    Work experience placement

    Illumio

    Sunnyvale, CA
    2 days ago
  • Education Requirements, Ideal Experience: Associate’s degree in Industrial Engineering or IT related field Minimum of 0-3 years’ relevant experience Knowledge of the application of tools/techniques Experience in one coding language (Preferred) Experience in Database (Preferred... 

    FII

    Sunnyvale, CA
    6 days ago
  • $150k - $195k

     ...milestones so that scale and resiliency are a part of every conversation. Develop best practices alongside engineering/operations teams to improve the scalability and reliability of internal processes. Participate in an on-call rotation. Minimum Qualifications 3 years of... 
    Full time
    Worldwide

    Isc2 Eastbay Chapter

    Sunnyvale, CA
    4 days ago
  • $147.4k - $220.9k

    Site Reliability Engineer, Customer Systems Sunnyvale, California, United States Software and Services Imagine what you could do here. Apple is a place where extraordinary people gather to do their best work. Together we craft products and experiences people once couldn... 
    Relocation

    Apple Inc.

    Sunnyvale, CA
    6 days ago
  •  ...join our small team focused on growth and productivity. The role involves scaling our platform and infrastructure while enhancing reliability and the overall developer experience. Ideal candidates will have strong expertise in distributed systems, cloud-native... 
    Remote job

    BuildBuddy

    Palo Alto, CA
    3 days ago
  •  ...technologies. Our mission is to double America’s compute capacity without building new data centers. We are seeking a skilled Site Reliability Engineer to join our growing team. The ideal candidate will help ensure the reliability, scalability, and performance of our hybrid... 
    Work at office
    Weekend work

    FLUIX

    Palo Alto, CA
    3 days ago
  • $180k - $360k

     ...knowledge. Our team is small, highly motivated, and focused on engineering excellence. This organization is for individuals who...  ...Cybersecurity / SRE team is focused on ensuring the security and reliability of X Money. This role will primarily focus on the X Money platform... 
    Temporary work
    Relocation

    Pantera Capital

    Palo Alto, CA
    6 days ago

Do you want to receive more vacancies?

Subscribe and receive similar vacancies to Lead Site Reliability Engineer. Be the first to apply!