Senior Site Reliability Engineer

$140k - $220k

Pylon

About the Job You’ll own reliability and operational excellence for Pylon’s production systems. This means designing and implementing monitoring, alerting, and incident response processes that scale as we grow. You’ll build tooling that makes the entire engineering team more effective, establish on‑call rotations and runbooks, and ensure our platform can handle the demands of a regulated, high‑stakes financial product. This is not a pure ops role. At Pylon, we believe SRE work should be a maximum of 50 % operational toil. If you’re spending more than half your time firefighting and keeping things running, you’re not doing SRE work, you’re doing sysadmin work. The other 50 %+ of your time should be spent writing code: building infrastructure tooling, automating away operational burden, making reliability improvements to core services, and creating internal developer productivity tools that make the entire team more effective. SRE is about making things better, not just keeping them alive. We’re looking for someone who has operated production systems at scale in a professional engineering environment. You know what good looks like because you’ve built it before. What We’re Looking For Must‑haves: 4+ years experience in SRE, infrastructure, or platform engineering roles Experience working on a team of SREs at a company with mature SRE practices (not solo SRE roles) Real on‑call experience at scale in a large production environment (you’ve carried the pager and lived through incidents) Deep AWS expertise (ECS, RDS, networking, security) Strong experience with declarative infrastructure (Terraform, CDK, or similar) Nix experience (we use it and want to expand its adoption) Track record of building reliability tooling and automation Can design and implement monitoring, alerting, and observability systems from first principles Comfortable working in a regulated environment where “breaking things” is not an option Nice‑to‑haves: Experience at companies with strong SRE cultures (Google, Replit, Stripe, etc.) Background in fintech, healthtech, or other regulated domains Experience migrating monitoring systems or implementing SLOs Contributions to infrastructure tooling or open source projects Our technology stack: We don’t require that you’ve worked with any of these technologies before, this is just our stack for your information: Infrastructure: AWS (ECS, RDS, CloudFront, Lambda), CDK for infrastructure‑as‑code Observability: Honeycomb, OpenTelemetry CI/CD: GitHub Actions, Nix for builds and dev environments Core platform: TypeScript/Node backend, PostgreSQL, React frontend Languages: TypeScript, Python, Nix, SQL About you You: Have operated production systems at scale. You’ve been on‑call for a large, complex system. You know what 3 am pages feel like and you’ve built systems to prevent them. You understand the difference between alerts that matter and noise. Write code, not just YAML. You can build internal tools, automation, and reliability improvements. You’re comfortable contributing to the core product when reliability requires it. You can read and understand the codebase you’re responsible for keeping up. Think in systems. You understand distributed systems, failure modes, cascading failures, and graceful degradation. You can diagnose production issues quickly and know when to escal… Know your tools deeply. You’ve used observability platforms at scale and understand how to instrument systems properly. You can design alerting that has high signal and low noise. You know AWS inside and out. Have strong opinions that you’re willing to defend. We have a culture of vigorous discussion and debate on technical decisions. We’ll push you to defend your choices, and we want you to push back. Don’t settle. Challenge yourself to frequently and consistently deliver exceptional work. If something could be more reliable, take the initiative to improve it. Have great ideas, and lots of them. You should see opportunities all around you to make the infrastructure, tooling and processes better. We’ll give you an environment where you can act on those ideas. Are self‑motivated. You can take a goal and drive towards it without needing extensive hand‑holding. The team is supportive and loves to share knowledge and advice, but there’s no time for micromanaging your work. Are comfortable with ambiguity. There’s a million ways to do things; you should feel at ease making a decision under uncertainty while balancing competing constraints. Are confident you can learn quickly. Mortgage is complex, our platform is complex, good SRE work is complex. You’ve got to have an attitude that you can absorb it, get on top of it, and build something better than what came before. Care about developer experience. Your work enables the entire team to ship faster and more confidently. You think about how to make the whole organization more effective. Who Will Succeed Here Someone who: Is deeply curious Wants to own features from design to development to deployment to maintenance Is willing to put the work in to solve the hardest of problems Location: Palo Alto, CA Base Salary Range: $140,000/yr to $220,000/yr + Equity + Benefits #J-18808-Ljbffr

Apply

Vacancy posted 8 hours ago

Similar jobs that could be interesting for youBased on the Senior Site Reliability Engineer in Palo Alto, CA vacancy

Senior Site Reliability Engineer / DevOps Engineer
...Infrastructure Footprint: Global production infrastructure across AWS, South America, and Europe Role Overview Seeking a Senior Site Reliability Engineer / DevOps Engineer to design, scale, and operate highly available global infrastructure supporting production systems...
Senior
Prophet Town
Mountain View, CA
8 hours ago
Senior Site Reliability Engineer — Hybrid + Unlimited PTO
$210k - $270k
Zocdoc is seeking a Senior Site Reliability Engineer to develop and maintain distributed production systems. The ideal candidate will have over 5 years of experience in site reliability or production engineering, particularly in cloud environments like AWS. Responsibilities...
Senior
GoTo Meeting
Palo Alto, CA
1 day ago
Senior Site Reliability Engineer
..., and the challenges of building in a high-growth startup, we’d love to talk. This is more than a job—it’s a journey. Site Reliability Engineers (SREs) are responsible for the overall performance and reliability of ASAPP's infrastructure and products. The team owns...
Senior
Remote work
ASAPP
Mountain View, CA
26 days ago
Senior Site Reliability Engineer
The Role We're looking for a Senior Site Reliability Engineer to own the reliability, scalability, and operational excellence of the production systems that power Nectar's platform. We run high-volume data ingestion pipelines and real-time AI agents on top of a fast-growing...
Senior
Nectar
Palo Alto, CA
1 day ago
Senior Site Reliability Engineer
$210k - $270k
Your Impact on our Mission: Zocdoc is looking for a Senior Site Reliability Engineer to help develop, monitor, and maintain our distributed production systems. You’ll be challenged with building frameworks and processes for ensuring uptime for our patients and providers...
Senior
Flexible hours
GoTo Meeting
Palo Alto, CA
1 day ago
Senior/Staff Site Reliability Engineer
$180k - $260k
...facilitating effortless integration into customers’ logistics operations. About the role We are seeking an experienced Senior/Staff Site Reliability Engineer to support the operation, monitoring, and scaling of our growing fleet of autonomous vehicles. In this role, you...
Senior
Odd job
Work at office
Remote work
Booster
Mountain View, CA
1 day ago
Senior Site Reliability Engineer: Cloud, Kubernetes & CI/CD
A leading tech recruiting firm is seeking a Site Reliability Engineer to manage and optimize cloud infrastructure primarily using GCP or AWS. The role involves maintaining high availability through Kubernetes clusters and improving CI/CD pipelines with Terraform. Ideal...
Senior
Amiri Recruiting
Mountain View, CA
1 day ago
Senior Site Reliability Engineer | Uptime, Cloud & GenAI
Zocdoc, located in Silicon Valley, CA, is seeking a Senior Site Reliability Engineer to monitor and maintain cloud-based systems ensuring uptime for millions of patients. You'll work with cutting-edge technology in a diverse and collaborative environment. This role requires...
Senior
Dormont Manufacturing Co
Palo Alto, CA
4 days ago
Senior Lead Site Reliability Engineer
Elevate your engineering prowess to unprecedented levels by joining a team of exceptionally gifted professionals... ...position yourself among the top echelon in site reliability and AI-powered infrastructure automation. As a Senior Lead Site Reliability Engineer at JPMorgan...
Senior
Work at office
JPMorgan Chase & Co.
Palo Alto, CA
1 day ago
Senior Site Reliability Engineer, AIOPs
...building an AI Data Center AIOps platform that turns raw, high‑volume telemetry into reliable, job‑centric insights and automation for GPU fleets. Join our team of innovative engineers who are building this platform and operating it (not the compute cluster): uptime, performance...
Senior
NVIDIA Gruppe
Santa Clara, CA
8 hours ago
Senior Site Reliability Engineer - HPC
$152k - $241.5k
...intelligence. Job Overview We’re looking for a Senior SRE to join our Compute Farm team and... ...host lifecycle management, fleet reliability/auto‑healing, E2E observability or data‑... ...Python, Go, Perl, or Ruby. Mentored other engineers and influenced technical direction through...
Senior
NVIDIA Gruppe
Santa Clara, CA
7 hours ago
Senior Manager, Site Reliability Engineering
$200k - $322k
...supportive environment, where NVIDIANs are inspired to excel and make a profound global impact. NVIDIA is seeking a Senior Manager of Site Reliability Engineering to lead and reshape how IT operations function at scale. This role goes beyond traditional service management...
Senior
NVIDIA Gruppe
Santa Clara, CA
8 hours ago
Senior Site Reliability Engineer: Scale, Automation & Cloud
Poshmark, Inc. is seeking a talented Site Reliability Engineer to ensure the health and performance of our web-scale systems. You will collaborate with development teams and focus on automating and monitoring systems for high reliability. The ideal candidate has 5 years...
Senior
Poshmark, Inc.
Redwood City, CA
4 days ago
Senior SRE & Software Engineer: Infra-as-Code & Cloud
$150k - $180k
A technology-focused data center developer in Mountain View, CA is looking for a Senior Site Reliability Engineer to manage software infrastructure. This full-time position requires experience in Software Engineering or DevOps, with strong proficiency in Golang. The role...
Senior
Full time
Verrus, LLC
Mountain View, CA
2 days ago
Sr. Staff Site Reliability Engineer
$232k - $263k
...Join us as we define the future of SaaS security! Sr. Staff Site Reliability Engineer As a Sr. Staff SRE at Obsidian , you will define and... ...Engineering, or related roles ~3+ years operating at a senior or technical leadership level (Staff or equivalent scope)...
Senior
Work from home
Flexible hours
Obsidian Security
Palo Alto, CA
4 days ago
Senior Software Engineer, Site Reliability Engineering
$174k - $252k
Senior Software Engineer, Site Reliability Engineering X Applicants in San Francisco: Qualified applications with arrest or conviction records will be considered for employment in accordance with the San Francisco Fair Chance Ordinance for Employers and the California...
Senior
Full time
Google Inc.
Sunnyvale, CA
1 day ago
Senior Site Reliability Engineer Cloud Platform
...vector database company for enterprise-grade AI. Founded by the engineers behind Milvus, the world’s most popular open-source vector... ...you will do: Work at the intersection of development and site reliability. Creating SRE tools and systems, as well as supporting...
Senior
Zilliz
Redwood City, CA
26 days ago
Senior Staff Site Reliability Engineer
$126k - $204.5k
...As part of this role, you will collaborate closely with our engineering teams to develop innovative solutions that provide clear and... ...team to influence the operability of the product and ensure the reliability and availability of our services. Qualifications Required...
Senior
Palo Alto Networks, Inc.
Santa Clara, CA
4 days ago
Senior Site Reliability Engineer — Scale, Automation & Uptime
$145k - $165k
A technology solutions firm in Sunnyvale, CA is looking for a highly experienced Site Reliability Engineer (SRE). This role involves maintaining uptime and performance across systems. Exceptional Linux expertise and automation skills in Bash and Python are crucial. Key...
Senior
Bolt Graphics, Inc.
Sunnyvale, CA
2 days ago
Senior Site Reliability Engineer - Observability and Telemetry Platform
$176k - $276k
Site Reliability Engineering (SRE) at NVIDIA is an engineering discipline to design, build and maintain large scale production systems with high efficiency and availability using the combination of software and systems engineering practices. This is a highly specialized...
Senior
NVIDIA Corporation
Santa Clara, CA
2 days ago
Senior Site Reliability Engineer - Apple Services Engineering (ASE) / iCloud at Apple Cupertino, CA
$175.8k - $264.2k
Senior Site Reliability Engineer - Apple Services Engineering (ASE) / iCloud Cupertino, CA People at Apple don't just build products - they craft experiences our customers love and depend on. Apple Services Engineering (ASE) builds and supports the systems that make many...
Senior
Hong Kong Study Skills Research Institute
Cupertino, CA
4 days ago
Site Reliability Engineer
...Site Reliability Engineer Onsite- Bay Area, CA Skills Relevant Skills and Experience What You’ll Do (Day-to-Day) Own and manage our cloud infrastructure (GCP or AWS, on-prem). Build, maintain, and optimize Kubernetes clusters (including GPU-backed clusters). Implement...
Amiri Recruiting
Mountain View, CA
3 days ago
Site Reliability Engineer - Cybersecurity
$180k - $360k
...knowledge. Our team is small, highly motivated, and focused on engineering excellence. This organization is for individuals who... ...Cybersecurity / SRE team is focused on ensuring the security and reliability of X Money. This role will primarily focus on the X Money platform...
Temporary work
Relocation
Pantera Capital
Palo Alto, CA
1 day ago
Site Reliability Engineer - Cybersecurity
$180k
...knowledge. Our team is small, highly motivated, and focused on engineering excellence. This organization is for individuals who... ...Cybersecurity / SRE team is focused on ensuring the security and reliability of X Money. This role will primarily focus on the X Money platform...
Permanent employment
Temporary work
Relocation
xAI
Palo Alto, CA
a month ago
Staff Site Reliability Engineer
...claim-filing software to backend fraud detection, we’re the engine that powers claims processing for some of the largest insurers... ...dynamic, collaborative, and rewarding. We are looking for a Staff Site Reliability Engineer to join our growing team. The ideal candidate will...
Temporary work
Remote work
Work from home
Home office
ASSURED
Palo Alto, CA
7 hours ago
Senior Release Engineer: AI CI/CD Architect
$109.35k - $243.2k
...Traveltechessentialist is looking for a Senior Release Engineer in Palo Alto, California, to architect and improve CI/CD pipelines in a landscape... ...range of $109,350–$243,200 USD, reflecting the importance of rapid yet reliable code deployment practices. #J-18808-Ljbffr...
Senior
Traveltechessentialist
Palo Alto, CA
8 hours ago
Senior Release Engineer
...join one of America's most beloved eCommerce companies as a Senior Release Engineer. This role will work across all web based brands and you'll... ...Skill Set Specific experience deploying large scale web sites/products Experience deploying cloud based apps Strong...
Senior
Black Swan Search
Mountain View, CA
3 days ago
Site Reliability Engineer (Hosted Infra) - Platform
$2,000 per month
...and purpose-built software that eliminates toil and improves reliability. We're also a team that grows people as well as systems. If... ..., we'd love to hear from you. What You Will Be Doing: Engineering software to automate large-scale systems - building internal...
Local area
Flexible hours
Elastic
Mountain View, CA
1 day ago
Site Reliability Engineer - 2
$86.33k - $191.9k
...guardrails to make going fast also going safely. Identifying reliability anti-patterns and solving them systemically . You dive deep into... ...of AI‑assisted developer tools and platforms to increase engineering productivity, enforce code quality standards, and enable real...
Local area
Flexible hours
Traveltechessentialist
Palo Alto, CA
4 days ago
Site Reliability Engineer
...technologies. Our mission is to double America’s compute capacity without building new data centers. We are seeking a skilled Site Reliability Engineer to join our growing team. The ideal candidate will help ensure the reliability, scalability, and performance of our hybrid...
Work at office
Weekend work
FLUIX
Palo Alto, CA
1 day ago

Do you want to receive more vacancies?

Subscribe and receive similar vacancies to Senior Site Reliability Engineer. Be the first to apply!