Software Engineer, Site Reliability
$180k - $250kFal
You are a seasoned SRE who keeps production infrastructure running at scale. You own the reliability and availability of customer-facing systems — from Kubernetes clusters to deployment pipelines to the networking layer that connects it all. You think in SLOs, automate ruthlessly, and treat every incident as a chance to make the system better. Key Responsibilities Own and operate our Kubernetes infrastructure: cluster lifecycle, upgrades, networking, and multi-tenant isolation for customer workloads Build and maintain CI/CD pipelines and deployment infrastructure Leverage AI to an extreme level to automate analysis and resolution of production issues, and improve software development speed, reliability and maintainability Build dashboards, alerting, and anomaly detection across our systems Define and enforce SLOs and build out incident response processes Manage and improve our networking, load balancing, and service mesh configurations Drive reliability improvements across the stack through automation, runbooks, and chaos engineering Requirements 5+ years experience in managing critical production systems and software development workflows Strong production experience setting up and operating Kubernetes at scale, using infrastructure-as-code (Terraform, Ansible) Deep knowledge of Linux networking, container networking (CNI plugins, VXLAN, BGP), and DNS Experience building CI/CD systems and GitOps workflows (FluxCD, ArgoCD) Proficiency in Python and either Go or Bash for tooling and automation Strong experience with logging, monitoring and alerting (Prometheus, Grafana, Loki, Thanos, VictoriaMetrics, Datadog) Excellent communication and ability to drive technical decisions across teams Self-starter who executes quickly, takes ownership, and constantly seeks improvement Nice to have Experience with managing GPU and AI/ML workloads Experience with kernel-based monitoring and routing (eBPF, XDP) Experience with security tooling (Falco, Coroot, SIEM) Experience with bare metal Kubernetes networking (Calico, Cilium, MetalLB) Experience with distributed storage systems (Ceph, Longhorn, etc.) Compensation $180,000-250,000 plus equity + benefits What we offer at fal Interesting and challenging work A lot of learning and growth opportunities We are currently hiring in downtown San Francisco. We offer visa sponsorship and will help you relocate to San Francisco. Health, dental, and vision insurance (US) #J-18808-Ljbffr Fal
$160k - $300k
...competitive advantage that drives performance, alpha, and market leadership. The Role We are looking for a Site Reliability Engineer who thinks like a software engineer first. You will own critical production systems end-to-end, designing, building, and improving them...Website$325k
...Anthropic’s mission is to create reliable, interpretable, and steerable... ...of committed researchers, engineers, policy experts, and business... ...serving -- critical for both site reliability and Anthropic's... ...looking for reliability-minded software engineers and SREs. Are...WebsiteVisa sponsorship$170k - $240k
SENIOR SOFTWARE ENGINEER - OBSERVABILITY AND RELIABILITY ABOUT THE ROLE We are growing the engineering team and looking for engineers who have the chops... ...Practices When you submit a job application on this site, Sigma processes your personal data for the purposes...WebsiteFull timeWork at officeFlexible hours- ...Connor was a machine learning research engineer at Scale AI. The rest of our team comes... ...Senior SRE, you'll tackle the scaling and reliability challenges that come with adding terabytes... .... Who You Are ~5+ years of software engineering experience with a strong backend...Website
- ...millions of daily users while enabling our engineering teams to ship fast. You'll own the... ...building automation and tooling that improves reliability and partnering with engineering to... ...services What you'll bring ~5+ years in Site Reliability Engineering, DevOps, or...WebsiteWork at officeWork from home
- ...shape the future of healthcare, we’d love to meet you. About the role We’re hiring an SRE to join our engineering team at Plenful and take ownership of the reliability and performance of the systems that power our product. You’ll work across our distributed workflow...WebsiteWork at officeRemote workFlexible hours2 days per week
- CloudDevs: Senior Web site Reliability Engineer (SRE) CloudDevs works with fast-moving, venture-backed startups throughout the US. We’re constructing... ...in designing for scale and bettering how groups ship software program, you’ll match proper in. Key Duties Work as a...Website
$180k - $200k
...additional in-office days for team or company events. _ Software Engineer, Platform Infrastructure sits under the umbrella of Product... ..., infrastructure, and systems to provide our customers with reliable, secure, and scalable software. Roles & Responsibilities:...WebsiteContract workWork at office- ...Job Description Velia Multiservices is proud to partner with a fast-growing, early-stage startup to identify a top-tier Site Reliability Engineer who will play a critical role in scaling and strengthening a high-performance platform used by enterprise clients such as...Website
- A dynamic tech firm located in San Francisco is seeking a Site Reliability Engineer to enhance operational health across their production systems. This high-impact role demands expertise in AWS and strong programming skills. You will manage production systems' reliability...Website
$148.5k - $223.9k
...Senior Member of Technical Staff (SMTS) - Site Reliability Engineer (Cloud Automation) Location: New York, NY; San Francisco, CA About... ...Bachelor's degree in Computer Science, Computer Engineering, Software Engineering or relevant work experience ~7+ years of...WebsiteWork experience placementShift work- ...Connor was a machine learning research engineer at Scale AI . The rest of our team comes... ...our Staff SRE Tech Lead, you'll own the reliability and scalability of our platform as we... ...stability. Who You Are ~8+ years of software engineering experience with a strong backend...Website
- ...Job Description Forhyre is looking for engineers who can bring unique perspectives and innovative... ...practices while building a culture of reliability and observability Engage in and improve the end to end lifecycle of software development--from inception and design, through...Website
- US Corp. is seeking a Lead Site Reliability Engineer to spearhead our mission of delivering highly available and performant systems. With an average... ..., the successful candidate will bridge the gap between software development and systems engineering. You will be...Website
- OutSystems, Inc. is looking for a Site Reliability Engineer to join their team in San Francisco, CA. The ideal candidate will lead the onboarding of services and teams to reliability tenets while establishing SLOs and SLAs. Proficiency in Python and experience with Kubernetes...WebsiteFlexible hours
- Fieldguide is seeking a Senior Site Reliability Engineer to ensure the reliability and scalability of our production systems in San Francisco, CA. The role involves working closely with product teams to define reliability standards and build robust observability practices...WebsiteRemote jobFlexible hours
- TELCOR Inc is looking for a Site Reliability Engineer to ensure the reliability, scalability, and performance of our AI products' systems. The role involves designing and operating resilient systems in cloud and containerized environments while managing production infrastructure...WebsiteRemote job
$150k
...Description About The Role We are seeking an experienced Site Reliability Engineer (SRE) with a strong focus on DevSecOps to join our growing... ...hygiene of our cloud infrastructure, APIs, and software supply chain. You will drive patch management programs, harden...Website- ...company in San Francisco seeks a Platform/DevOps Engineer to manage and optimize CI/CD pipelines, enhance infrastructure reliability, and facilitate deployment across multiple... ...a flexible work environment, following an on-site requirement in San Francisco. #J-18808-Ljbffr...WebsiteFlexible hours
$175k - $250k
I did my part and supported the Regular Toilet is seeking a Site Reliability Engineer to enhance the reliability and performance of our systems at WorkOS. As a key member of the SRE team, you will handle critical responsibilities like improving incident responses and collaborating...WebsiteRemote jobFlexible hours$129.3k
...Software Development Engineer, Kuiper Trust Services Job ID: 3126384 | Amazon.com Services LLC Locations... ...or architecturedesign patterns, reliability, and scaling) of new and existing... ...Applicants should apply via our internal or external career site. #J-18808-Ljbffr...WebsitePermanent employmentInternshipFlexible hours$163k - $203k
...will be a senior technical contributor on the SRE team, responsible for the reliability, scalability, and security of Prosper’s Cloud Platform portfolio. This is as much of a platform engineering role as it is SRE role — you will maintain the applications that run on our...WebsiteWork experience placementWork at officeLocal areaRemote workFlexible hours2 days per week- We are seeking a Sr. Site Reliability Engineer to join our team and run critical infrastructure for our blockchain and web applications. You’ll... ...Developer A seasoned developer with a solid foundation in software engineering, particularly in backend development. Someone...WebsiteRemote job
$130k - $155k
...endless fun and challenging engineering problems across search, discovery... ...performant, scalable, and reliable solutions that enable us to scale... ...developing a best-in-class software development process... ...LinkedIn (preferred), personal site, or GitHub * #J-18808-Ljbffr...WebsiteFull timeWork at officeRemote workFlexible hours- ...manifesto. About the Role We're looking for an Infrastructure Engineer to take the lead on scaling our operational resilience as we... ...This is a high-impact, high-trust role where you’ll shape how reliability is done - reducing incident load, building internal tooling, and...WebsiteWorldwideShift work
- ...co‑founders with PhDs in AI, Math, and Computer Science — is poised to redefine computing. About the Role We're seeking a Site Reliability Engineer to ensure Hyperbolic's GPU marketplace and AI infrastructure operate with exceptional reliability, performance, and...Website
$227.2k - $324.5k
...About the Role: Site Reliability Engineering (SRE) at Tubi is not a traditional operations team. We are a software engineering organization that applies a developer's mindset and toolkit to the challenges of building and running large-scale, distributed systems....WebsiteFull timeContract workTemporary workLocal areaFlexible hours- ...that significantly outperforms individual engineers. We combine language models with human ingenuity to push the boundaries of software development efficiency and quality. The Role We are seeking an experienced Site Reliability Engineer to join our Platform Engineering...Website
- ...back and when to dive deep. We call this role a Cloud Service Reliability Engineer. The Cloud Service Reliability Engineer will be... ...automating infrastructure, service delivery, and engineering site reliability, maintaining infrastructure on premise and in cloud...Website
$150k - $170k
Claryo, Inc. is seeking an Integration Reliability Engineer in San Francisco, CA, responsible for ensuring the reliability of systems across cloud and edge environments. The candidate will build and maintain observability tools and improve incident response processes....Website
Do you want to receive more vacancies?
Subscribe and receive similar vacancies to Software Engineer, Site Reliability. Be the first to apply!
- software developer internship no experience San Francisco, CA
- federal - software developer San Francisco, CA
- research software engineer San Francisco, CA
- software engineer contract San Francisco, CA
- part time software developer San Francisco, CA
- software engineer healthcare San Francisco, CA
- network software engineer San Francisco, CA
- ngo software engineer San Francisco, CA
- software development engineer aws San Francisco, CA
- software developer internship San Francisco, CA



