Sign up to access all features of our service.
  • Job search
  • Favorites
  • Create a CV
    New
  • Salaries
  • Subscriptions

Senior Site Reliability Engineer

OutSystems, Inc.

Hybrid onsite in Menlo Park, CA. Responsibilities Lead and onboard services and teams to the reliability tenets. Establish and maintain Service Level Objectives (SLOs) and Service Level Agreements (SLAs). Design and implement scalable, reliable, and secure infrastructure, ensuring cloud‑native best practices. Collaborate with software development teams to build resilient, observable, fault‑tolerant, recoverable, and scalable systems. Implement monitoring, alerting, logging, and tracing solutions to detect and respond to incidents. Lead incident response efforts, ensuring rapid resolution and minimal downtime, and conduct root‑cause analysis (RCA) and post‑mortems. Automate operational tasks, focusing on fast incident detection and recovery. Program in Python, using Gen AI tooling to accelerate automation and tool development. Foster a culture of continuous improvement and knowledge sharing. Communicate effectively with stakeholders, providing updates on system reliability and performance. Participate in on‑call rotation to provide 24/7 support for production systems. Performance Indicators SLA and Service Level Objectives (SLO) compliance; SLO coverage and detection ratio; Mean time to acknowledge (MTTA); Mean time to resolve (MTTR). Qualifications Bachelor's or Master’s degree in Computer Science or equivalent. 6+ years of experience in Site Reliability Engineering, managing infrastructure and services at scale. History of end‑to‑end project delivery. Experience managing Hadoop and Kubernetes infrastructure or equivalent. Advanced knowledge of Linux, networking, and containers. Proficiency in at least one high‑level programming language (Python, Go, etc.). Strong troubleshooting and debugging skills. Fluency in English with excellent communication skills. Experience with prompt engineering, AI‑native IDEs, or AI assistants such as Cursor, GitHub Copilot, or Claude. Technical Skills Establishment, monitoring, and improvement of SLOs, SLIs, and SLAs aligned with business needs. Containerization technologies and orchestration platforms—mainly Kubernetes and EKS (CKA, CKAD, CKS certifications are valued). Automation and Infrastructure as Code (IaC) tools, such as AWS CloudFormation, Terraform, Puppet, Chef, Spacelift, etc. Python, Go, Bash/Shell scripting, or other automation languages. Familiarity with AWS services like EC2, RDS, ELB, CloudFront, Lambda, etc. Monitoring and troubleshooting complex distributed systems using Grafana, ELK stack, Prometheus, or similar. Designing resilient and fault‑tolerant systems; debugging complex distributed systems. Soft Skills Effective communication (oral and written) in English, with empathy for stakeholders. Collaboration and proactive presentation of ideas to leadership. Humbleness—admitting mistakes, mitigating impact, and learning from errors. Accountability—owning problems and driving them to resolution. Negotiation skills—defusing conflicts and leading toward mutual agreement. Process orientation—following defined processes while challenging inefficiencies. Problem‑solving and critical thinking—breaking problems into smaller parts and analyzing objectively. EEO Statement As an equal opportunity employer, all qualified applicants receive equal consideration regardless of race, origin, religion, sex, sexual orientation, gender identity, disability, veteran status, or any other protected status. #J-18808-Ljbffr OutSystems, Inc.

Vacancy posted 4 days ago
Similar jobs that could be interesting for youBased on the Senior Site Reliability Engineer in San Francisco, CA vacancy
  •  ...The TeamPlatform Engineering is the department within SRE that is responsible for a range of critical infrastructure and operational...  ...fleet, alongside the critical components that ensure cluster reliability and security (e.g., CoreDNS, cert-manager, and Gatekeeper). As... 
    Senior
    Work at office
    Local area
    Remote work
    Worldwide
    Flexible hours

    MongoDB

    San Francisco, CA
    3 days ago
  • $140k - $205k

     ...Senior Technology Site Reliability Engineer Cooley is seeking a Senior Site Reliability Engineer to join the Infrastructure & Development Operationsteam. Position summary: The Senior Technology Site Reliability Engineer("SRE") is responsible for ensuring the reliability... 
    Senior
    Full time
    Temporary work
    Work at office
    Flexible hours
    Weekend work

    Cooley

    San Francisco, CA
    3 days ago
  • $210.6k - $305.1k

     ...Qualifications: ~ You have led a distributed team of 5+ engineers, can demonstrate strong technical vision for your team, and ensure...  ..., and basic life insurance. Please see the Cisco careers site to discover more benefits and perks. Employees may be eligible... 
    Senior
    Full time
    Temporary work
    Local area
    Flexible hours

    Cisco

    San Francisco, CA
    3 days ago
  • $227.2k - $324.5k

     ...About the Role: Site Reliability Engineering (SRE) at Tubi is not a traditional operations team. We are a software engineering organization...  ...automation. We are seeking an experienced and visionary Senior SRE Manager to lead and grow our newly built Site Reliability... 
    Senior
    Full time
    Contract work
    Temporary work
    Local area
    Flexible hours

    Tubi

    San Francisco, CA
    2 days ago
  • OutSystems, Inc. is looking for a Site Reliability Engineer to join their team in San Francisco, CA. The ideal candidate will lead the onboarding of services and teams to reliability tenets while establishing SLOs and SLAs. Proficiency in Python and experience with Kubernetes... 
    Senior
    Flexible hours

    OutSystems, Inc.

    San Francisco, CA
    4 days ago
  • $140k - $185k

     ...alongside clinicians to make that possible. We’re a team of doctors, engineers, designers, researchers, and creatives building tools that...  ...in on-call and incident response: Improve operational reliability: Own parts of the production environment: Strengthen observability... 
    Senior
    Work at office
    Worldwide

    Dormont Manufacturing Co

    San Francisco, CA
    4 days ago
  • $166.9k - $225.9k

    Job Summary Drata's SRE team operates as both a central engineering function and an embedded reliability practice. You'll be part of a close-knit SRE team...  ...organization. What you’ll bring 6+ years of experience in Site Reliability Engineering, Cloud Engineering, or... 
    Senior
    Flexible hours

    Drata

    San Francisco, CA
    17 hours ago
  • $325k

    Engineering at Ivo Engineers At Ivo Are Inventors. Ivo Was First-to-market With An AI agent that lives in MS Word and edits...  ...expect us to hit our SLAs. We’re looking for an Senior or Staff Site level Reliability Engineer as part of the Infrastructure team to: Own uptime... 
    Senior
    Contract work

    Icehouseventures

    San Francisco, CA
    3 days ago
  • $15 per hour

    Summary The Wikimedia Foundation is looking for a Senior Site Reliability Engineer to support and develop the platform serving the world’s favorite encyclopedia, Wikipedia, to millions of people around the globe. Wikimedia’s Site Reliability Engineering (SRE) team is... 
    Senior
    Permanent employment
    For contractors
    Remote work

    Nerdleveltech

    San Francisco, CA
    3 days ago
  • Senior Site Reliability Engineer - AI Infrastructure Location: Global Remote / San Francisco · Full-Time About Andromeda Andromeda Cluster was founded by Nat Friedman and Daniel Gross to give early-stage startups access to the kind of scaled AI infrastructure once reserved... 
    Senior
    Full time
    Remote work

    Andromeda

    San Francisco, CA
    4 days ago
  • $232k - $319k

     ...to help us continue to scale the service with great people and reliable, cost-effective, and efficient infrastructure, processes, and...  ...platform capabilities in partnership with architects and product engineering Build a world-class observability platform and monitoring... 
    Senior
    Permanent employment
    Local area
    Worldwide
    Flexible hours

    Okta, Inc.

    San Francisco, CA
    4 days ago
  •  ...cloud‑native systems. As a Staff Platform Engineer, you will play a critical role in...  ...technical leadership role. You will own reliability for major platform domains, design scalable...  ...Infrastructure Development, Platform Engineering, or Site Reliability Engineering role, with a... 
    Senior

    Saviynt

    San Francisco, CA
    4 days ago
  • $180k - $210k

    Location Remote US Employment Type Full time Location Type Remote Department Tech Engineering Compensation $180K - $210K • Offers Equity The base salary & equity offered for this position will depend on several factors, including location, experience, qualifications... 
    Senior
    Remote job
    Full time
    H1b
    Work at office
    Worldwide
    Visa sponsorship
    Flexible hours

    Twelve Labs

    San Francisco, CA
    2 days ago
  • $127k - $249k

    We are looking for an experienced Senior or Staff Engineer for our SRE, InfraSec team, to guide the security of our cloud-based infrastructure. As a Staff SRE, you will be very hands-on technically while also mentoring a small team of SREs. The InfraSec team collaborates... 
    Senior
    Full time
    Local area
    Remote work
    Worldwide
    Flexible hours

    MongoDB

    San Francisco, CA
    2 days ago
  • $163k - $203k

     ...Your role in our misson   You will be a senior technical contributor on the SRE team, responsible for the reliability, scalability, and security of Prosper’s Cloud...  ...Platform portfolio. This is as much of a platform engineering role as it is SRE role — you will maintain... 
    Senior
    Work experience placement
    Work at office
    Local area
    Remote work
    Flexible hours
    2 days per week

    Prosper.com

    San Francisco, CA
    3 days ago
  • $163k - $203k

    Your role in our mission you will be a senior technical contributor on the SRE team, responsible for the reliability, scalability, and security of Prosper’s Cloud Platform portfolio. This is as much a platform engineering role as it is an SRE role— you will maintain the... 
    Senior
    Work experience placement
    Work at office
    Remote work
    Flexible hours
    2 days per week

    GoTo Meeting

    San Francisco, CA
    4 days ago
  • A leading biotechnology firm in South San Francisco is seeking a Site Reliability Engineer to architect and implement Infrastructure as Code (IaC) solutions that enhance cloud-based platform solutions for Machine Learning and HPC workloads. The ideal candidate has extensive... 
    Senior
    3 days per week

    Genentech

    South San Francisco, CA
    4 days ago
  •  ...Bachelor's degree in Computer Science, related technical field, or equivalent practical experience , 5+ years of experience in Site Reliability Engineering, DevOps, or a similar role focused on large-scale production systems , Deep expertise in SRE principles and practices,... 
    Senior

    Fireworks AI

    San Francisco, CA
    4 days ago
  • $104k - $130k

     ...infrastructure as well as help improve the reliability, quality of services and overall...  ...recovery.  You’ll collaborate or embed with engineering teams, helping them to improve the...  ...more about our locations by visiting our site. Compensation & Benefits The base... 
    Full time
    Work experience placement

    AppFolio

    San Francisco, CA
    3 days ago
  • $210k - $300k

     ...Site Reliability Engineer (SRE) / DevOps Engineer Location: Onsite in NYC or San Francisco Compensation: $210,000–$300,000 Base Salary About the Role We are seeking an experienced Site Reliability Engineer (SRE) / DevOps Engineer to help build, scale, and operate... 

    TechLine Consulting

    San Francisco, CA
    1 day ago
  • $261k - $326k

    A technology company specializing in AI infrastructure is seeking a Principal Engineer to enhance reliability and scalability of cloud systems. This role demands over 15 years of experience in production engineering or related fields and involves setting technical directions... 
    Senior

    Crusoe

    San Francisco, CA
    4 days ago
  • $238k - $290k

     ...professional services is being written today — and we're just getting started. Role Overview As a Staff Software Engineer on the Site Reliability team at Harvey, you will ensure the reliability, scalability, and performance of our legal AI platform. You'll join a... 
    Relocation package

    Harvey

    San Francisco, CA
    4 days ago
  • $300k

     ...thousands of H100s, H200s, and B200s, ready for experimentation, full-scale model training, or inference. As a Platform Engineer/Senior Site Reliability Engineer, you’ll own the reliability, performance, and automation of this GPU-powered infrastructure, ensuring... 
    Senior
    Permanent employment
    San Francisco, CA
    more than 2 months ago
  • $250k

     ...across Europe, while now significantly expanding its footprint in the United States. The company is looking for a Senior / Staff Site Reliability Engineer to support and scale large-scale HPC and cloud environments powering GPU-intensive workloads. The role involves... 
    Senior
    Permanent employment
    Remote work
    San Francisco, CA
    7 days ago
  • $125k - $165k

    Position: Site Reliability Engineer Location: San Francisco, CA Job Id: 434 # of Openings: 1 TELCOR Inc, a leading innovator in laboratory software, is looking for a Site Reliability Engineer to join our TELCOR AI Systems team! Do you have strong experience in cloud... 
    Temporary work
    Work at office
    Visa sponsorship
    Work visa
    Relocation package
    Flexible hours

    TELCOR

    San Francisco, CA
    2 days ago
  • An innovative tech platform is seeking a Senior Principal Software Engineer to lead the development of its next-gen API Platform. The role involves defining the technical vision, collaborating with various departments, and mentoring other engineers. The ideal candidate... 
    Senior
    Remote work

    jobright.com

    San Francisco, CA
    2 days ago
  •  ...Job Description Job Description Forhyre is looking for engineers who can bring unique perspectives and innovative ideas to all areas...  ...evangelize cloud best practices while building a culture of reliability and observability Engage in and improve the end to end lifecycle... 

    Forhyre

    San Francisco, CA
    13 days ago
  • A leading AI research company in San Francisco is seeking a software engineer for its Fleet High Performance Computing team. In this role, you'll ensure the reliability and uptime of the compute fleet, working with automation systems and monitoring tools. Ideal candidates... 
    Senior

    OpenAI

    San Francisco, CA
    17 hours ago
  •  ...Job Description Velia Multiservices is proud to partner with a fast-growing, early-stage startup to identify a top-tier Site Reliability Engineer who will play a critical role in scaling and strengthening a high-performance platform used by enterprise clients such as... 

    Velia multiservices

    San Francisco, CA
    13 days ago
  • A leading AI research company based in San Francisco is seeking a skilled software engineer with over 5 years of experience, including 2 years at a top-tier product company. The role involves evaluating AI-generated code, collaborating on AI-driven solutions, and designing... 
    Senior
    For contractors
    Remote work
    Flexible hours

    Turing

    San Francisco, CA
    1 day ago

Do you want to receive more vacancies?

Subscribe and receive similar vacancies to Senior Site Reliability Engineer. Be the first to apply!