Sign up to access all features of our service.
  • Job search
  • Favorites
  • Create a CV
    New
  • Salaries
  • Subscriptions

Senior Site Reliability Engineer

$140k - $220k

Pylon

About the Job You’ll own reliability and operational excellence for Pylon’s production systems. This means designing and implementing monitoring, alerting, and incident response processes that scale as we grow. You’ll build tooling that makes the entire engineering team more effective, establish on‑call rotations and runbooks, and ensure our platform can handle the demands of a regulated, high‑stakes financial product. This is not a pure ops role. At Pylon, we believe SRE work should be a maximum of 50 % operational toil. If you’re spending more than half your time firefighting and keeping things running, you’re not doing SRE work, you’re doing sysadmin work. The other 50 %+ of your time should be spent writing code: building infrastructure tooling, automating away operational burden, making reliability improvements to core services, and creating internal developer productivity tools that make the entire team more effective. SRE is about making things better, not just keeping them alive. We’re looking for someone who has operated production systems at scale in a professional engineering environment. You know what good looks like because you’ve built it before. What We’re Looking For Must‑haves: 4+ years experience in SRE, infrastructure, or platform engineering roles Experience working on a team of SREs at a company with mature SRE practices (not solo SRE roles) Real on‑call experience at scale in a large production environment (you’ve carried the pager and lived through incidents) Deep AWS expertise (ECS, RDS, networking, security) Strong experience with declarative infrastructure (Terraform, CDK, or similar) Nix experience (we use it and want to expand its adoption) Track record of building reliability tooling and automation Can design and implement monitoring, alerting, and observability systems from first principles Comfortable working in a regulated environment where “breaking things” is not an option Nice‑to‑haves: Experience at companies with strong SRE cultures (Google, Replit, Stripe, etc.) Background in fintech, healthtech, or other regulated domains Experience migrating monitoring systems or implementing SLOs Contributions to infrastructure tooling or open source projects Our technology stack: We don’t require that you’ve worked with any of these technologies before, this is just our stack for your information: Infrastructure: AWS (ECS, RDS, CloudFront, Lambda), CDK for infrastructure‑as‑code Observability: Honeycomb, OpenTelemetry CI/CD: GitHub Actions, Nix for builds and dev environments Core platform: TypeScript/Node backend, PostgreSQL, React frontend Languages: TypeScript, Python, Nix, SQL About you You: Have operated production systems at scale. You’ve been on‑call for a large, complex system. You know what 3 am pages feel like and you’ve built systems to prevent them. You understand the difference between alerts that matter and noise. Write code, not just YAML. You can build internal tools, automation, and reliability improvements. You’re comfortable contributing to the core product when reliability requires it. You can read and understand the codebase you’re responsible for keeping up. Think in systems. You understand distributed systems, failure modes, cascading failures, and graceful degradation. You can diagnose production issues quickly and know when to escal… Know your tools deeply. You’ve used observability platforms at scale and understand how to instrument systems properly. You can design alerting that has high signal and low noise. You know AWS inside and out. Have strong opinions that you’re willing to defend. We have a culture of vigorous discussion and debate on technical decisions. We’ll push you to defend your choices, and we want you to push back. Don’t settle. Challenge yourself to frequently and consistently deliver exceptional work. If something could be more reliable, take the initiative to improve it. Have great ideas, and lots of them. You should see opportunities all around you to make the infrastructure, tooling and processes better. We’ll give you an environment where you can act on those ideas. Are self‑motivated. You can take a goal and drive towards it without needing extensive hand‑holding. The team is supportive and loves to share knowledge and advice, but there’s no time for micromanaging your work. Are comfortable with ambiguity. There’s a million ways to do things; you should feel at ease making a decision under uncertainty while balancing competing constraints. Are confident you can learn quickly. Mortgage is complex, our platform is complex, good SRE work is complex. You’ve got to have an attitude that you can absorb it, get on top of it, and build something better than what came before. Care about developer experience. Your work enables the entire team to ship faster and more confidently. You think about how to make the whole organization more effective. Who Will Succeed Here Someone who: Is deeply curious Wants to own features from design to development to deployment to maintenance Is willing to put the work in to solve the hardest of problems Location: Palo Alto, CA Base Salary Range: $140,000/yr to $220,000/yr + Equity + Benefits #J-18808-Ljbffr

Vacancy posted 8 hours ago
Similar jobs that could be interesting for youBased on the Senior Site Reliability Engineer in Palo Alto, CA vacancy
  •  ...Infrastructure Footprint: Global production infrastructure across AWS, South America, and Europe Role Overview Seeking a Senior Site Reliability Engineer / DevOps Engineer to design, scale, and operate highly available global infrastructure supporting production systems... 
    Senior

    Prophet Town

    Mountain View, CA
    8 hours ago
  • $210k - $270k

    Zocdoc is seeking a Senior Site Reliability Engineer to develop and maintain distributed production systems. The ideal candidate will have over 5 years of experience in site reliability or production engineering, particularly in cloud environments like AWS. Responsibilities... 
    Senior

    GoTo Meeting

    Palo Alto, CA
    1 day ago
  •  ..., and the challenges of building in a high-growth startup, we’d love to talk. This is more than a job—it’s a journey. Site Reliability Engineers (SREs) are responsible for the overall performance and reliability of ASAPP's infrastructure and products. The team owns... 
    Senior
    Remote work

    ASAPP

    Mountain View, CA
    26 days ago
  • The Role We're looking for a Senior Site Reliability Engineer to own the reliability, scalability, and operational excellence of the production systems that power Nectar's platform. We run high-volume data ingestion pipelines and real-time AI agents on top of a fast-growing... 
    Senior

    Nectar

    Palo Alto, CA
    1 day ago
  • $210k - $270k

    Your Impact on our Mission: Zocdoc is looking for a Senior Site Reliability Engineer to help develop, monitor, and maintain our distributed production systems. You’ll be challenged with building frameworks and processes for ensuring uptime for our patients and providers... 
    Senior
    Flexible hours

    GoTo Meeting

    Palo Alto, CA
    1 day ago
  • $180k - $260k

     ...facilitating effortless integration into customers’ logistics operations. About the role We are seeking an experienced Senior/Staff Site Reliability Engineer to support the operation, monitoring, and scaling of our growing fleet of autonomous vehicles. In this role, you... 
    Senior
    Odd job
    Work at office
    Remote work

    Booster

    Mountain View, CA
    1 day ago
  • A leading tech recruiting firm is seeking a Site Reliability Engineer to manage and optimize cloud infrastructure primarily using GCP or AWS. The role involves maintaining high availability through Kubernetes clusters and improving CI/CD pipelines with Terraform. Ideal... 
    Senior

    Amiri Recruiting

    Mountain View, CA
    1 day ago
  • Zocdoc, located in Silicon Valley, CA, is seeking a Senior Site Reliability Engineer to monitor and maintain cloud-based systems ensuring uptime for millions of patients. You'll work with cutting-edge technology in a diverse and collaborative environment. This role requires... 
    Senior

    Dormont Manufacturing Co

    Palo Alto, CA
    4 days ago
  • Elevate your engineering prowess to unprecedented levels by joining a team of exceptionally gifted professionals...  ...position yourself among the top echelon in site reliability and AI-powered infrastructure automation. As a Senior Lead Site Reliability Engineer at JPMorgan... 
    Senior
    Work at office

    JPMorgan Chase & Co.

    Palo Alto, CA
    1 day ago
  •  ...building an AI Data Center AIOps platform that turns raw, high‑volume telemetry into reliable, job‑centric insights and automation for GPU fleets. Join our team of innovative engineers who are building this platform and operating it (not the compute cluster): uptime, performance... 
    Senior

    NVIDIA Gruppe

    Santa Clara, CA
    8 hours ago
  • $152k - $241.5k

     ...intelligence. Job Overview We’re looking for a Senior SRE to join our Compute Farm team and...  ...host lifecycle management, fleet reliability/auto‑healing, E2E observability or data‑...  ...Python, Go, Perl, or Ruby. Mentored other engineers and influenced technical direction through... 
    Senior

    NVIDIA Gruppe

    Santa Clara, CA
    7 hours ago
  • $200k - $322k

     ...supportive environment, where NVIDIANs are inspired to excel and make a profound global impact. NVIDIA is seeking a Senior Manager of Site Reliability Engineering to lead and reshape how IT operations function at scale. This role goes beyond traditional service management... 
    Senior

    NVIDIA Gruppe

    Santa Clara, CA
    8 hours ago
  • Poshmark, Inc. is seeking a talented Site Reliability Engineer to ensure the health and performance of our web-scale systems. You will collaborate with development teams and focus on automating and monitoring systems for high reliability. The ideal candidate has 5 years... 
    Senior

    Poshmark, Inc.

    Redwood City, CA
    4 days ago
  • $150k - $180k

    A technology-focused data center developer in Mountain View, CA is looking for a Senior Site Reliability Engineer to manage software infrastructure. This full-time position requires experience in Software Engineering or DevOps, with strong proficiency in Golang. The role... 
    Senior
    Full time

    Verrus, LLC

    Mountain View, CA
    2 days ago
  • $232k - $263k

     ...Join us as we define the future of SaaS security! Sr. Staff Site Reliability Engineer As a Sr. Staff SRE at Obsidian , you will define and...  ...Engineering, or related roles ~3+ years operating at a senior or technical leadership level (Staff or equivalent scope)... 
    Senior
    Work from home
    Flexible hours

    Obsidian Security

    Palo Alto, CA
    4 days ago
  • $174k - $252k

    Senior Software Engineer, Site Reliability Engineering X Applicants in San Francisco: Qualified applications with arrest or conviction records will be considered for employment in accordance with the San Francisco Fair Chance Ordinance for Employers and the California... 
    Senior
    Full time

    Google Inc.

    Sunnyvale, CA
    1 day ago
  •  ...vector database company for enterprise-grade AI. Founded by the engineers behind Milvus, the world’s most popular open-source vector...  ...you will do: Work at the intersection of development and site reliability. Creating SRE tools and systems, as well as supporting... 
    Senior

    Zilliz

    Redwood City, CA
    26 days ago
  • $126k - $204.5k

     ...As part of this role, you will collaborate closely with our engineering teams to develop innovative solutions that provide clear and...  ...team to influence the operability of the product and ensure the reliability and availability of our services. Qualifications Required... 
    Senior

    Palo Alto Networks, Inc.

    Santa Clara, CA
    4 days ago
  • $145k - $165k

    A technology solutions firm in Sunnyvale, CA is looking for a highly experienced Site Reliability Engineer (SRE). This role involves maintaining uptime and performance across systems. Exceptional Linux expertise and automation skills in Bash and Python are crucial. Key... 
    Senior

    Bolt Graphics, Inc.

    Sunnyvale, CA
    2 days ago
  • $176k - $276k

    Site Reliability Engineering (SRE) at NVIDIA is an engineering discipline to design, build and maintain large scale production systems with high efficiency and availability using the combination of software and systems engineering practices. This is a highly specialized... 
    Senior

    NVIDIA Corporation

    Santa Clara, CA
    2 days ago
  • $175.8k - $264.2k

    Senior Site Reliability Engineer - Apple Services Engineering (ASE) / iCloud Cupertino, CA People at Apple don't just build products - they craft experiences our customers love and depend on. Apple Services Engineering (ASE) builds and supports the systems that make many... 
    Senior

    Hong Kong Study Skills Research Institute

    Cupertino, CA
    4 days ago
  •  ...Site Reliability Engineer Onsite- Bay Area, CA Skills Relevant Skills and Experience What You’ll Do (Day-to-Day) Own and manage our cloud infrastructure (GCP or AWS, on-prem). Build, maintain, and optimize Kubernetes clusters (including GPU-backed clusters). Implement... 

    Amiri Recruiting

    Mountain View, CA
    3 days ago
  • $180k - $360k

     ...knowledge. Our team is small, highly motivated, and focused on engineering excellence. This organization is for individuals who...  ...Cybersecurity / SRE team is focused on ensuring the security and reliability of X Money. This role will primarily focus on the X Money platform... 
    Temporary work
    Relocation

    Pantera Capital

    Palo Alto, CA
    1 day ago
  • $180k

     ...knowledge. Our team is small, highly motivated, and focused on engineering excellence. This organization is for individuals who...  ...Cybersecurity / SRE team is focused on ensuring the security and reliability of X Money. This role will primarily focus on the X Money platform... 
    Permanent employment
    Temporary work
    Relocation

    xAI

    Palo Alto, CA
    a month ago
  •  ...claim-filing software to backend fraud detection, we’re the engine that powers claims processing for some of the largest insurers...  ...dynamic, collaborative, and rewarding. We are looking for a Staff Site Reliability Engineer to join our growing team. The ideal candidate will... 
    Temporary work
    Remote work
    Work from home
    Home office

    ASSURED

    Palo Alto, CA
    7 hours ago
  • $109.35k - $243.2k

     ...Traveltechessentialist is looking for a Senior Release Engineer in Palo Alto, California, to architect and improve CI/CD pipelines in a landscape...  ...range of $109,350–$243,200 USD, reflecting the importance of rapid yet reliable code deployment practices. #J-18808-Ljbffr... 
    Senior

    Traveltechessentialist

    Palo Alto, CA
    8 hours ago
  •  ...join one of America's most beloved eCommerce companies as a Senior Release Engineer. This role will work across all web based brands and you'll...  ...Skill Set Specific experience deploying large scale web sites/products Experience deploying cloud based apps Strong... 
    Senior

    Black Swan Search

    Mountain View, CA
    3 days ago
  • $2,000 per month

     ...and purpose-built software that eliminates toil and improves reliability. We're also a team that grows people as well as systems. If...  ..., we'd love to hear from you. What You Will Be Doing: Engineering software to automate large-scale systems - building internal... 
    Local area
    Flexible hours

    Elastic

    Mountain View, CA
    1 day ago
  • $86.33k - $191.9k

     ...guardrails to make going fast also going safely. Identifying reliability anti-patterns and solving them systemically . You dive deep into...  ...of AI‑assisted developer tools and platforms to increase engineering productivity, enforce code quality standards, and enable real... 
    Local area
    Flexible hours

    Traveltechessentialist

    Palo Alto, CA
    4 days ago
  •  ...technologies. Our mission is to double America’s compute capacity without building new data centers. We are seeking a skilled Site Reliability Engineer to join our growing team. The ideal candidate will help ensure the reliability, scalability, and performance of our hybrid... 
    Work at office
    Weekend work

    FLUIX

    Palo Alto, CA
    1 day ago

Do you want to receive more vacancies?

Subscribe and receive similar vacancies to Senior Site Reliability Engineer. Be the first to apply!