Sign up to access all features of our service.
  • Job search
  • Favorites
  • Create a CV
    New
  • Salaries
  • Subscriptions

Software Engineer, Site Reliability

$180k - $250k

Fal

You are a seasoned SRE who keeps production infrastructure running at scale. You own the reliability and availability of customer-facing systems — from Kubernetes clusters to deployment pipelines to the networking layer that connects it all. You think in SLOs, automate ruthlessly, and treat every incident as a chance to make the system better. Key Responsibilities Own and operate our Kubernetes infrastructure: cluster lifecycle, upgrades, networking, and multi-tenant isolation for customer workloads Build and maintain CI/CD pipelines and deployment infrastructure Leverage AI to an extreme level to automate analysis and resolution of production issues, and improve software development speed, reliability and maintainability Build dashboards, alerting, and anomaly detection across our systems Define and enforce SLOs and build out incident response processes Manage and improve our networking, load balancing, and service mesh configurations Drive reliability improvements across the stack through automation, runbooks, and chaos engineering Requirements 5+ years experience in managing critical production systems and software development workflows Strong production experience setting up and operating Kubernetes at scale, using infrastructure-as-code (Terraform, Ansible) Deep knowledge of Linux networking, container networking (CNI plugins, VXLAN, BGP), and DNS Experience building CI/CD systems and GitOps workflows (FluxCD, ArgoCD) Proficiency in Python and either Go or Bash for tooling and automation Strong experience with logging, monitoring and alerting (Prometheus, Grafana, Loki, Thanos, VictoriaMetrics, Datadog) Excellent communication and ability to drive technical decisions across teams Self-starter who executes quickly, takes ownership, and constantly seeks improvement Nice to have Experience with managing GPU and AI/ML workloads Experience with kernel-based monitoring and routing (eBPF, XDP) Experience with security tooling (Falco, Coroot, SIEM) Experience with bare metal Kubernetes networking (Calico, Cilium, MetalLB) Experience with distributed storage systems (Ceph, Longhorn, etc.) Compensation $180,000-250,000 plus equity + benefits What we offer at fal Interesting and challenging work A lot of learning and growth opportunities We are currently hiring in downtown San Francisco. We offer visa sponsorship and will help you relocate to San Francisco. Health, dental, and vision insurance (US) #J-18808-Ljbffr Fal

Vacancy posted 2 days ago
Similar jobs that could be interesting for youBased on the Software Engineer, Site Reliability in San Francisco, CA vacancy
  • $160k - $300k

     ...competitive advantage that drives performance, alpha, and market leadership. The Role We are looking for a Site Reliability Engineer who thinks like a software engineer first. You will own critical production systems end-to-end, designing, building, and improving them... 
    Website

    Hebbia, Inc.

    San Francisco, CA
    2 days ago
  • $325k

     ...Anthropic’s mission is to create reliable, interpretable, and steerable...  ...of committed researchers, engineers, policy experts, and business...  ...serving -- critical for both site reliability and Anthropic's...  ...looking for reliability-minded software engineers and SREs. Are... 
    Website
    Visa sponsorship

    Menlo Ventures

    San Francisco, CA
    4 days ago
  • $170k - $240k

    SENIOR SOFTWARE ENGINEER - OBSERVABILITY AND RELIABILITY ABOUT THE ROLE We are growing the engineering team and looking for engineers who have the chops...  ...Practices When you submit a job application on this site, Sigma processes your personal data for the purposes... 
    Website
    Full time
    Work at office
    Flexible hours

    Sigma Computing

    San Francisco, CA
    1 day ago
  •  ...Connor was a machine learning research engineer at Scale AI. The rest of our team comes...  ...Senior SRE, you'll tackle the scaling and reliability challenges that come with adding terabytes...  .... Who You Are ~5+ years of software engineering experience with a strong backend... 
    Website

    Unify

    San Francisco, CA
    2 days ago
  •  ...millions of daily users while enabling our engineering teams to ship fast. You'll own the...  ...building automation and tooling that improves reliability and partnering with engineering to...  ...services What you'll bring ~5+ years in Site Reliability Engineering, DevOps, or... 
    Website
    Work at office
    Work from home

    gamma.app

    San Francisco, CA
    26 minutes ago
  •  ...shape the future of healthcare, we’d love to meet you. About the role We’re hiring an SRE to join our engineering team at Plenful and take ownership of the reliability and performance of the systems that power our product. You’ll work across our distributed workflow... 
    Website
    Work at office
    Remote work
    Flexible hours
    2 days per week

    Plenful

    San Francisco, CA
    26 minutes ago
  • CloudDevs: Senior Web site Reliability Engineer (SRE) CloudDevs works with fast-moving, venture-backed startups throughout the US. We’re constructing...  ...in designing for scale and bettering how groups ship software program, you’ll match proper in. Key Duties Work as a... 
    Website

    The10minutecareersolution

    San Francisco, CA
    4 days ago
  • $180k - $200k

     ...additional in-office days for team or company events. _ Software Engineer, Platform Infrastructure sits under the umbrella of Product...  ..., infrastructure, and systems to provide our customers with reliable, secure, and scalable software. Roles & Responsibilities:... 
    Website
    Contract work
    Work at office

    PactSafe (acquired by Ironclad)

    San Francisco, CA
    more than 2 months ago
  •  ...Job Description Velia Multiservices is proud to partner with a fast-growing, early-stage startup to identify a top-tier Site Reliability Engineer who will play a critical role in scaling and strengthening a high-performance platform used by enterprise clients such as... 
    Website

    Velia multiservices

    San Francisco, CA
    26 days ago
  • A dynamic tech firm located in San Francisco is seeking a Site Reliability Engineer to enhance operational health across their production systems. This high-impact role demands expertise in AWS and strong programming skills. You will manage production systems' reliability... 
    Website

    gamma.app

    San Francisco, CA
    5 days ago
  • $148.5k - $223.9k

     ...Senior Member of Technical Staff (SMTS) - Site Reliability Engineer (Cloud Automation) Location: New York, NY; San Francisco, CA About...  ...Bachelor's degree in Computer Science, Computer Engineering, Software Engineering or relevant work experience ~7+ years of... 
    Website
    Work experience placement
    Shift work

    Salesforce

    San Francisco, CA
    1 day ago
  •  ...Connor was a machine learning research engineer at Scale AI . The rest of our team comes...  ...our Staff SRE Tech Lead, you'll own the reliability and scalability of our platform as we...  ...stability. Who You Are ~8+ years of software engineering experience with a strong backend... 
    Website

    Unify

    San Francisco, CA
    2 days ago
  •  ...Job Description Forhyre is looking for engineers who can bring unique perspectives and innovative...  ...practices while building a culture of reliability and observability Engage in and improve the end to end lifecycle of software development--from inception and design, through... 
    Website

    Forhyre

    San Francisco, CA
    26 days ago
  • US Corp. is seeking a Lead Site Reliability Engineer to spearhead our mission of delivering highly available and performant systems. With an average...  ..., the successful candidate will bridge the gap between software development and systems engineering. You will be... 
    Website

    Axiom Pursuits

    San Francisco, CA
    2 days ago
  • OutSystems, Inc. is looking for a Site Reliability Engineer to join their team in San Francisco, CA. The ideal candidate will lead the onboarding of services and teams to reliability tenets while establishing SLOs and SLAs. Proficiency in Python and experience with Kubernetes... 
    Website
    Flexible hours

    OutSystems, Inc.

    San Francisco, CA
    2 days ago
  • Fieldguide is seeking a Senior Site Reliability Engineer to ensure the reliability and scalability of our production systems in San Francisco, CA. The role involves working closely with product teams to define reliability standards and build robust observability practices... 
    Website
    Remote job
    Flexible hours

    Fieldguide

    San Francisco, CA
    5 days ago
  • TELCOR Inc is looking for a Site Reliability Engineer to ensure the reliability, scalability, and performance of our AI products' systems. The role involves designing and operating resilient systems in cloud and containerized environments while managing production infrastructure... 
    Website
    Remote job

    TELCOR Inc

    San Francisco, CA
    5 days ago
  • $150k

     ...Description About The Role We are seeking an experienced Site Reliability Engineer (SRE) with a strong focus on DevSecOps to join our growing...  ...hygiene of our cloud infrastructure, APIs, and software supply chain. You will drive patch management programs, harden... 
    Website

    VantageScore

    San Francisco, CA
    a month ago
  •  ...company in San Francisco seeks a Platform/DevOps Engineer to manage and optimize CI/CD pipelines, enhance infrastructure reliability, and facilitate deployment across multiple...  ...a flexible work environment, following an on-site requirement in San Francisco. #J-18808-Ljbffr... 
    Website
    Flexible hours

    Untolabs

    San Francisco, CA
    5 days ago
  • $175k - $250k

    I did my part and supported the Regular Toilet is seeking a Site Reliability Engineer to enhance the reliability and performance of our systems at WorkOS. As a key member of the SRE team, you will handle critical responsibilities like improving incident responses and collaborating... 
    Website
    Remote job
    Flexible hours

    I did my part and supported the Regular Toilet

    San Francisco, CA
    2 days ago
  • $129.3k

     ...Software Development Engineer, Kuiper Trust Services Job ID: 3126384 | Amazon.com Services LLC Locations...  ...or architecturedesign patterns, reliability, and scaling) of new and existing...  ...Applicants should apply via our internal or external career site. #J-18808-Ljbffr... 
    Website
    Permanent employment
    Internship
    Flexible hours

    Amazon

    San Francisco, CA
    2 days ago
  • $163k - $203k

     ...will be a senior technical contributor on the SRE team, responsible for the reliability, scalability, and security of Prosper’s Cloud Platform portfolio. This is as much of a platform engineering role as it is SRE role — you will maintain the applications that run on our... 
    Website
    Work experience placement
    Work at office
    Local area
    Remote work
    Flexible hours
    2 days per week

    Prosper

    San Francisco, CA
    21 days ago
  • We are seeking a Sr. Site Reliability Engineer to join our team and run critical infrastructure for our blockchain and web applications. You’ll...  ...Developer A seasoned developer with a solid foundation in software engineering, particularly in backend development. Someone... 
    Website
    Remote job

    Blockchain Works

    San Francisco, CA
    4 days ago
  • $130k - $155k

     ...endless fun and challenging engineering problems across search, discovery...  ...performant, scalable, and reliable solutions that enable us to scale...  ...developing a best-in-class software development process...  ...LinkedIn (preferred), personal site, or GitHub * #J-18808-Ljbffr... 
    Website
    Full time
    Work at office
    Remote work
    Flexible hours

    SupportFinity™

    San Francisco, CA
    2 days ago
  •  ...manifesto. About the Role We're looking for an Infrastructure Engineer to take the lead on scaling our operational resilience as we...  ...This is a high-impact, high-trust role where you’ll shape how reliability is done - reducing incident load, building internal tooling, and... 
    Website
    Worldwide
    Shift work

    Happyrobot Inc.

    San Francisco, CA
    2 days ago
  •  ...co‑founders with PhDs in AI, Math, and Computer Science — is poised to redefine computing. About the Role We're seeking a Site Reliability Engineer to ensure Hyperbolic's GPU marketplace and AI infrastructure operate with exceptional reliability, performance, and... 
    Website

    deCircle

    San Francisco, CA
    1 day ago
  • $227.2k - $324.5k

     ...About the Role: Site Reliability Engineering (SRE) at Tubi is not a traditional operations team. We are a software engineering organization that applies a developer's mindset and toolkit to the challenges of building and running large-scale, distributed systems.... 
    Website
    Full time
    Contract work
    Temporary work
    Local area
    Flexible hours

    Tubi

    San Francisco, CA
    1 day ago
  •  ...that significantly outperforms individual engineers. We combine language models with human ingenuity to push the boundaries of software development efficiency and quality. The Role We are seeking an experienced Site Reliability Engineer to join our Platform Engineering... 
    Website

    CodeRabbit

    San Francisco, CA
    2 days ago
  •  ...back and when to dive deep. We call this role a Cloud Service Reliability Engineer. The Cloud Service Reliability Engineer will be...  ...automating infrastructure, service delivery, and engineering site reliability, maintaining infrastructure on premise and in cloud... 
    Website

    Forhyre

    San Francisco, CA
    25 days ago
  • $150k - $170k

    Claryo, Inc. is seeking an Integration Reliability Engineer in San Francisco, CA, responsible for ensuring the reliability of systems across cloud and edge environments. The candidate will build and maintain observability tools and improve incident response processes.... 
    Website

    Claryo, Inc.

    San Francisco, CA
    4 days ago

Do you want to receive more vacancies?

Subscribe and receive similar vacancies to Software Engineer, Site Reliability. Be the first to apply!