Senior Site Reliability Engineer
OutSystems, Inc.
Hybrid onsite in Menlo Park, CA. Responsibilities Lead and onboard services and teams to the reliability tenets. Establish and maintain Service Level Objectives (SLOs) and Service Level Agreements (SLAs). Design and implement scalable, reliable, and secure infrastructure, ensuring cloud‑native best practices. Collaborate with software development teams to build resilient, observable, fault‑tolerant, recoverable, and scalable systems. Implement monitoring, alerting, logging, and tracing solutions to detect and respond to incidents. Lead incident response efforts, ensuring rapid resolution and minimal downtime, and conduct root‑cause analysis (RCA) and post‑mortems. Automate operational tasks, focusing on fast incident detection and recovery. Program in Python, using Gen AI tooling to accelerate automation and tool development. Foster a culture of continuous improvement and knowledge sharing. Communicate effectively with stakeholders, providing updates on system reliability and performance. Participate in on‑call rotation to provide 24/7 support for production systems. Performance Indicators SLA and Service Level Objectives (SLO) compliance; SLO coverage and detection ratio; Mean time to acknowledge (MTTA); Mean time to resolve (MTTR). Qualifications Bachelor's or Master’s degree in Computer Science or equivalent. 6+ years of experience in Site Reliability Engineering, managing infrastructure and services at scale. History of end‑to‑end project delivery. Experience managing Hadoop and Kubernetes infrastructure or equivalent. Advanced knowledge of Linux, networking, and containers. Proficiency in at least one high‑level programming language (Python, Go, etc.). Strong troubleshooting and debugging skills. Fluency in English with excellent communication skills. Experience with prompt engineering, AI‑native IDEs, or AI assistants such as Cursor, GitHub Copilot, or Claude. Technical Skills Establishment, monitoring, and improvement of SLOs, SLIs, and SLAs aligned with business needs. Containerization technologies and orchestration platforms—mainly Kubernetes and EKS (CKA, CKAD, CKS certifications are valued). Automation and Infrastructure as Code (IaC) tools, such as AWS CloudFormation, Terraform, Puppet, Chef, Spacelift, etc. Python, Go, Bash/Shell scripting, or other automation languages. Familiarity with AWS services like EC2, RDS, ELB, CloudFront, Lambda, etc. Monitoring and troubleshooting complex distributed systems using Grafana, ELK stack, Prometheus, or similar. Designing resilient and fault‑tolerant systems; debugging complex distributed systems. Soft Skills Effective communication (oral and written) in English, with empathy for stakeholders. Collaboration and proactive presentation of ideas to leadership. Humbleness—admitting mistakes, mitigating impact, and learning from errors. Accountability—owning problems and driving them to resolution. Negotiation skills—defusing conflicts and leading toward mutual agreement. Process orientation—following defined processes while challenging inefficiencies. Problem‑solving and critical thinking—breaking problems into smaller parts and analyzing objectively. EEO Statement As an equal opportunity employer, all qualified applicants receive equal consideration regardless of race, origin, religion, sex, sexual orientation, gender identity, disability, veteran status, or any other protected status. #J-18808-Ljbffr OutSystems, Inc.
- ...The TeamPlatform Engineering is the department within SRE that is responsible for a range of critical infrastructure and operational... ...fleet, alongside the critical components that ensure cluster reliability and security (e.g., CoreDNS, cert-manager, and Gatekeeper). As...SeniorWork at officeLocal areaRemote workWorldwideFlexible hours
$140k - $205k
...Senior Technology Site Reliability Engineer Cooley is seeking a Senior Site Reliability Engineer to join the Infrastructure & Development Operationsteam. Position summary: The Senior Technology Site Reliability Engineer("SRE") is responsible for ensuring the reliability...SeniorFull timeTemporary workWork at officeFlexible hoursWeekend work$210.6k - $305.1k
...Qualifications: ~ You have led a distributed team of 5+ engineers, can demonstrate strong technical vision for your team, and ensure... ..., and basic life insurance. Please see the Cisco careers site to discover more benefits and perks. Employees may be eligible...SeniorFull timeTemporary workLocal areaFlexible hours$227.2k - $324.5k
...About the Role: Site Reliability Engineering (SRE) at Tubi is not a traditional operations team. We are a software engineering organization... ...automation. We are seeking an experienced and visionary Senior SRE Manager to lead and grow our newly built Site Reliability...SeniorFull timeContract workTemporary workLocal areaFlexible hours- OutSystems, Inc. is looking for a Site Reliability Engineer to join their team in San Francisco, CA. The ideal candidate will lead the onboarding of services and teams to reliability tenets while establishing SLOs and SLAs. Proficiency in Python and experience with Kubernetes...SeniorFlexible hours
$140k - $185k
...alongside clinicians to make that possible. We’re a team of doctors, engineers, designers, researchers, and creatives building tools that... ...in on-call and incident response: Improve operational reliability: Own parts of the production environment: Strengthen observability...SeniorWork at officeWorldwide$166.9k - $225.9k
Job Summary Drata's SRE team operates as both a central engineering function and an embedded reliability practice. You'll be part of a close-knit SRE team... ...organization. What you’ll bring 6+ years of experience in Site Reliability Engineering, Cloud Engineering, or...SeniorFlexible hours$325k
Engineering at Ivo Engineers At Ivo Are Inventors. Ivo Was First-to-market With An AI agent that lives in MS Word and edits... ...expect us to hit our SLAs. We’re looking for an Senior or Staff Site level Reliability Engineer as part of the Infrastructure team to: Own uptime...SeniorContract work$15 per hour
Summary The Wikimedia Foundation is looking for a Senior Site Reliability Engineer to support and develop the platform serving the world’s favorite encyclopedia, Wikipedia, to millions of people around the globe. Wikimedia’s Site Reliability Engineering (SRE) team is...SeniorPermanent employmentFor contractorsRemote work- Senior Site Reliability Engineer - AI Infrastructure Location: Global Remote / San Francisco · Full-Time About Andromeda Andromeda Cluster was founded by Nat Friedman and Daniel Gross to give early-stage startups access to the kind of scaled AI infrastructure once reserved...SeniorFull timeRemote work
$232k - $319k
...to help us continue to scale the service with great people and reliable, cost-effective, and efficient infrastructure, processes, and... ...platform capabilities in partnership with architects and product engineering Build a world-class observability platform and monitoring...SeniorPermanent employmentLocal areaWorldwideFlexible hours- ...cloud‑native systems. As a Staff Platform Engineer, you will play a critical role in... ...technical leadership role. You will own reliability for major platform domains, design scalable... ...Infrastructure Development, Platform Engineering, or Site Reliability Engineering role, with a...Senior
$180k - $210k
Location Remote US Employment Type Full time Location Type Remote Department Tech Engineering Compensation $180K - $210K • Offers Equity The base salary & equity offered for this position will depend on several factors, including location, experience, qualifications...SeniorRemote jobFull timeH1bWork at officeWorldwideVisa sponsorshipFlexible hours$127k - $249k
We are looking for an experienced Senior or Staff Engineer for our SRE, InfraSec team, to guide the security of our cloud-based infrastructure. As a Staff SRE, you will be very hands-on technically while also mentoring a small team of SREs. The InfraSec team collaborates...SeniorFull timeLocal areaRemote workWorldwideFlexible hours$163k - $203k
...Your role in our misson You will be a senior technical contributor on the SRE team, responsible for the reliability, scalability, and security of Prosper’s Cloud... ...Platform portfolio. This is as much of a platform engineering role as it is SRE role — you will maintain...SeniorWork experience placementWork at officeLocal areaRemote workFlexible hours2 days per week$163k - $203k
Your role in our mission you will be a senior technical contributor on the SRE team, responsible for the reliability, scalability, and security of Prosper’s Cloud Platform portfolio. This is as much a platform engineering role as it is an SRE role— you will maintain the...SeniorWork experience placementWork at officeRemote workFlexible hours2 days per week- A leading biotechnology firm in South San Francisco is seeking a Site Reliability Engineer to architect and implement Infrastructure as Code (IaC) solutions that enhance cloud-based platform solutions for Machine Learning and HPC workloads. The ideal candidate has extensive...Senior3 days per week
- ...Bachelor's degree in Computer Science, related technical field, or equivalent practical experience , 5+ years of experience in Site Reliability Engineering, DevOps, or a similar role focused on large-scale production systems , Deep expertise in SRE principles and practices,...Senior
$104k - $130k
...infrastructure as well as help improve the reliability, quality of services and overall... ...recovery. You’ll collaborate or embed with engineering teams, helping them to improve the... ...more about our locations by visiting our site. Compensation & Benefits The base...Full timeWork experience placement$210k - $300k
...Site Reliability Engineer (SRE) / DevOps Engineer Location: Onsite in NYC or San Francisco Compensation: $210,000–$300,000 Base Salary About the Role We are seeking an experienced Site Reliability Engineer (SRE) / DevOps Engineer to help build, scale, and operate...$261k - $326k
A technology company specializing in AI infrastructure is seeking a Principal Engineer to enhance reliability and scalability of cloud systems. This role demands over 15 years of experience in production engineering or related fields and involves setting technical directions...Senior$238k - $290k
...professional services is being written today — and we're just getting started. Role Overview As a Staff Software Engineer on the Site Reliability team at Harvey, you will ensure the reliability, scalability, and performance of our legal AI platform. You'll join a...Relocation package$300k
...thousands of H100s, H200s, and B200s, ready for experimentation, full-scale model training, or inference. As a Platform Engineer/Senior Site Reliability Engineer, you’ll own the reliability, performance, and automation of this GPU-powered infrastructure, ensuring...SeniorPermanent employment$250k
...across Europe, while now significantly expanding its footprint in the United States. The company is looking for a Senior / Staff Site Reliability Engineer to support and scale large-scale HPC and cloud environments powering GPU-intensive workloads. The role involves...SeniorPermanent employmentRemote work$125k - $165k
Position: Site Reliability Engineer Location: San Francisco, CA Job Id: 434 # of Openings: 1 TELCOR Inc, a leading innovator in laboratory software, is looking for a Site Reliability Engineer to join our TELCOR AI Systems team! Do you have strong experience in cloud...Temporary workWork at officeVisa sponsorshipWork visaRelocation packageFlexible hours- An innovative tech platform is seeking a Senior Principal Software Engineer to lead the development of its next-gen API Platform. The role involves defining the technical vision, collaborating with various departments, and mentoring other engineers. The ideal candidate...SeniorRemote work
- ...Job Description Job Description Forhyre is looking for engineers who can bring unique perspectives and innovative ideas to all areas... ...evangelize cloud best practices while building a culture of reliability and observability Engage in and improve the end to end lifecycle...
- A leading AI research company in San Francisco is seeking a software engineer for its Fleet High Performance Computing team. In this role, you'll ensure the reliability and uptime of the compute fleet, working with automation systems and monitoring tools. Ideal candidates...Senior
- ...Job Description Velia Multiservices is proud to partner with a fast-growing, early-stage startup to identify a top-tier Site Reliability Engineer who will play a critical role in scaling and strengthening a high-performance platform used by enterprise clients such as...
- A leading AI research company based in San Francisco is seeking a skilled software engineer with over 5 years of experience, including 2 years at a top-tier product company. The role involves evaluating AI-generated code, collaborating on AI-driven solutions, and designing...SeniorFor contractorsRemote workFlexible hours
Do you want to receive more vacancies?
Subscribe and receive similar vacancies to Senior Site Reliability Engineer. Be the first to apply!
- site reliability engineer San Francisco, CA
- site reliability engineer sre San Francisco, CA
- site reliability engineer remote San Francisco, CA
- senior game producer San Francisco, CA
- senior manager process engineering San Francisco, CA
- senior manufacturing engineer San Francisco, CA
- senior manager clinical operations San Francisco, CA
- senior lead project manager San Francisco, CA
- senior manager quality engineering San Francisco, CA
- senior device engineer San Francisco, CA



