Senior Site Reliability Engineer
$140k - $220kPylon
About the Job You’ll own reliability and operational excellence for Pylon’s production systems. This means designing and implementing monitoring, alerting, and incident response processes that scale as we grow. You’ll build tooling that makes the entire engineering team more effective, establish on‑call rotations and runbooks, and ensure our platform can handle the demands of a regulated, high‑stakes financial product. This is not a pure ops role. At Pylon, we believe SRE work should be a maximum of 50 % operational toil. If you’re spending more than half your time firefighting and keeping things running, you’re not doing SRE work, you’re doing sysadmin work. The other 50 %+ of your time should be spent writing code: building infrastructure tooling, automating away operational burden, making reliability improvements to core services, and creating internal developer productivity tools that make the entire team more effective. SRE is about making things better, not just keeping them alive. We’re looking for someone who has operated production systems at scale in a professional engineering environment. You know what good looks like because you’ve built it before. What We’re Looking For Must‑haves: 4+ years experience in SRE, infrastructure, or platform engineering roles Experience working on a team of SREs at a company with mature SRE practices (not solo SRE roles) Real on‑call experience at scale in a large production environment (you’ve carried the pager and lived through incidents) Deep AWS expertise (ECS, RDS, networking, security) Strong experience with declarative infrastructure (Terraform, CDK, or similar) Nix experience (we use it and want to expand its adoption) Track record of building reliability tooling and automation Can design and implement monitoring, alerting, and observability systems from first principles Comfortable working in a regulated environment where “breaking things” is not an option Nice‑to‑haves: Experience at companies with strong SRE cultures (Google, Replit, Stripe, etc.) Background in fintech, healthtech, or other regulated domains Experience migrating monitoring systems or implementing SLOs Contributions to infrastructure tooling or open source projects Our technology stack: We don’t require that you’ve worked with any of these technologies before, this is just our stack for your information: Infrastructure: AWS (ECS, RDS, CloudFront, Lambda), CDK for infrastructure‑as‑code Observability: Honeycomb, OpenTelemetry CI/CD: GitHub Actions, Nix for builds and dev environments Core platform: TypeScript/Node backend, PostgreSQL, React frontend Languages: TypeScript, Python, Nix, SQL About you You: Have operated production systems at scale. You’ve been on‑call for a large, complex system. You know what 3 am pages feel like and you’ve built systems to prevent them. You understand the difference between alerts that matter and noise. Write code, not just YAML. You can build internal tools, automation, and reliability improvements. You’re comfortable contributing to the core product when reliability requires it. You can read and understand the codebase you’re responsible for keeping up. Think in systems. You understand distributed systems, failure modes, cascading failures, and graceful degradation. You can diagnose production issues quickly and know when to escal… Know your tools deeply. You’ve used observability platforms at scale and understand how to instrument systems properly. You can design alerting that has high signal and low noise. You know AWS inside and out. Have strong opinions that you’re willing to defend. We have a culture of vigorous discussion and debate on technical decisions. We’ll push you to defend your choices, and we want you to push back. Don’t settle. Challenge yourself to frequently and consistently deliver exceptional work. If something could be more reliable, take the initiative to improve it. Have great ideas, and lots of them. You should see opportunities all around you to make the infrastructure, tooling and processes better. We’ll give you an environment where you can act on those ideas. Are self‑motivated. You can take a goal and drive towards it without needing extensive hand‑holding. The team is supportive and loves to share knowledge and advice, but there’s no time for micromanaging your work. Are comfortable with ambiguity. There’s a million ways to do things; you should feel at ease making a decision under uncertainty while balancing competing constraints. Are confident you can learn quickly. Mortgage is complex, our platform is complex, good SRE work is complex. You’ve got to have an attitude that you can absorb it, get on top of it, and build something better than what came before. Care about developer experience. Your work enables the entire team to ship faster and more confidently. You think about how to make the whole organization more effective. Who Will Succeed Here Someone who: Is deeply curious Wants to own features from design to development to deployment to maintenance Is willing to put the work in to solve the hardest of problems Location: Palo Alto, CA Base Salary Range: $140,000/yr to $220,000/yr + Equity + Benefits #J-18808-Ljbffr
- ...Infrastructure Footprint: Global production infrastructure across AWS, South America, and Europe Role Overview Seeking a Senior Site Reliability Engineer / DevOps Engineer to design, scale, and operate highly available global infrastructure supporting production systems...Senior
$210k - $270k
Zocdoc is seeking a Senior Site Reliability Engineer to develop and maintain distributed production systems. The ideal candidate will have over 5 years of experience in site reliability or production engineering, particularly in cloud environments like AWS. Responsibilities...Senior- ..., and the challenges of building in a high-growth startup, we’d love to talk. This is more than a job—it’s a journey. Site Reliability Engineers (SREs) are responsible for the overall performance and reliability of ASAPP's infrastructure and products. The team owns...SeniorRemote work
- The Role We're looking for a Senior Site Reliability Engineer to own the reliability, scalability, and operational excellence of the production systems that power Nectar's platform. We run high-volume data ingestion pipelines and real-time AI agents on top of a fast-growing...Senior
$210k - $270k
Your Impact on our Mission: Zocdoc is looking for a Senior Site Reliability Engineer to help develop, monitor, and maintain our distributed production systems. You’ll be challenged with building frameworks and processes for ensuring uptime for our patients and providers...SeniorFlexible hours$180k - $260k
...facilitating effortless integration into customers’ logistics operations. About the role We are seeking an experienced Senior/Staff Site Reliability Engineer to support the operation, monitoring, and scaling of our growing fleet of autonomous vehicles. In this role, you...SeniorOdd jobWork at officeRemote work- A leading tech recruiting firm is seeking a Site Reliability Engineer to manage and optimize cloud infrastructure primarily using GCP or AWS. The role involves maintaining high availability through Kubernetes clusters and improving CI/CD pipelines with Terraform. Ideal...Senior
- Zocdoc, located in Silicon Valley, CA, is seeking a Senior Site Reliability Engineer to monitor and maintain cloud-based systems ensuring uptime for millions of patients. You'll work with cutting-edge technology in a diverse and collaborative environment. This role requires...Senior
- Elevate your engineering prowess to unprecedented levels by joining a team of exceptionally gifted professionals... ...position yourself among the top echelon in site reliability and AI-powered infrastructure automation. As a Senior Lead Site Reliability Engineer at JPMorgan...SeniorWork at office
- ...building an AI Data Center AIOps platform that turns raw, high‑volume telemetry into reliable, job‑centric insights and automation for GPU fleets. Join our team of innovative engineers who are building this platform and operating it (not the compute cluster): uptime, performance...Senior
$152k - $241.5k
...intelligence. Job Overview We ’re looking for a Senior SRE to join our Compute Farm team and... ...host lifecycle management, fleet reliability/auto‑healing, E2E observability or data‑... ...Python, Go, Perl, or Ruby. Mentored other engineers and influenced technical direction through...Senior$200k - $322k
...supportive environment, where NVIDIANs are inspired to excel and make a profound global impact. NVIDIA is seeking a Senior Manager of Site Reliability Engineering to lead and reshape how IT operations function at scale. This role goes beyond traditional service management...Senior- Poshmark, Inc. is seeking a talented Site Reliability Engineer to ensure the health and performance of our web-scale systems. You will collaborate with development teams and focus on automating and monitoring systems for high reliability. The ideal candidate has 5 years...Senior
$150k - $180k
A technology-focused data center developer in Mountain View, CA is looking for a Senior Site Reliability Engineer to manage software infrastructure. This full-time position requires experience in Software Engineering or DevOps, with strong proficiency in Golang. The role...SeniorFull time$232k - $263k
...Join us as we define the future of SaaS security! Sr. Staff Site Reliability Engineer As a Sr. Staff SRE at Obsidian , you will define and... ...Engineering, or related roles ~3+ years operating at a senior or technical leadership level (Staff or equivalent scope)...SeniorWork from homeFlexible hours$174k - $252k
Senior Software Engineer, Site Reliability Engineering X Applicants in San Francisco: Qualified applications with arrest or conviction records will be considered for employment in accordance with the San Francisco Fair Chance Ordinance for Employers and the California...SeniorFull time- ...vector database company for enterprise-grade AI. Founded by the engineers behind Milvus, the world’s most popular open-source vector... ...you will do: Work at the intersection of development and site reliability. Creating SRE tools and systems, as well as supporting...Senior
$126k - $204.5k
...As part of this role, you will collaborate closely with our engineering teams to develop innovative solutions that provide clear and... ...team to influence the operability of the product and ensure the reliability and availability of our services. Qualifications Required...Senior$145k - $165k
A technology solutions firm in Sunnyvale, CA is looking for a highly experienced Site Reliability Engineer (SRE). This role involves maintaining uptime and performance across systems. Exceptional Linux expertise and automation skills in Bash and Python are crucial. Key...Senior$176k - $276k
Site Reliability Engineering (SRE) at NVIDIA is an engineering discipline to design, build and maintain large scale production systems with high efficiency and availability using the combination of software and systems engineering practices. This is a highly specialized...Senior$175.8k - $264.2k
Senior Site Reliability Engineer - Apple Services Engineering (ASE) / iCloud Cupertino, CA People at Apple don't just build products - they craft experiences our customers love and depend on. Apple Services Engineering (ASE) builds and supports the systems that make many...Senior- ...Site Reliability Engineer Onsite- Bay Area, CA Skills Relevant Skills and Experience What You’ll Do (Day-to-Day) Own and manage our cloud infrastructure (GCP or AWS, on-prem). Build, maintain, and optimize Kubernetes clusters (including GPU-backed clusters). Implement...
$180k - $360k
...knowledge. Our team is small, highly motivated, and focused on engineering excellence. This organization is for individuals who... ...Cybersecurity / SRE team is focused on ensuring the security and reliability of X Money. This role will primarily focus on the X Money platform...Temporary workRelocation$180k
...knowledge. Our team is small, highly motivated, and focused on engineering excellence. This organization is for individuals who... ...Cybersecurity / SRE team is focused on ensuring the security and reliability of X Money. This role will primarily focus on the X Money platform...Permanent employmentTemporary workRelocation- ...claim-filing software to backend fraud detection, we’re the engine that powers claims processing for some of the largest insurers... ...dynamic, collaborative, and rewarding. We are looking for a Staff Site Reliability Engineer to join our growing team. The ideal candidate will...Temporary workRemote workWork from homeHome office
$109.35k - $243.2k
...Traveltechessentialist is looking for a Senior Release Engineer in Palo Alto, California, to architect and improve CI/CD pipelines in a landscape... ...range of $109,350–$243,200 USD, reflecting the importance of rapid yet reliable code deployment practices. #J-18808-Ljbffr...Senior- ...join one of America's most beloved eCommerce companies as a Senior Release Engineer. This role will work across all web based brands and you'll... ...Skill Set Specific experience deploying large scale web sites/products Experience deploying cloud based apps Strong...Senior
$2,000 per month
...and purpose-built software that eliminates toil and improves reliability. We're also a team that grows people as well as systems. If... ..., we'd love to hear from you. What You Will Be Doing: Engineering software to automate large-scale systems - building internal...Local areaFlexible hours$86.33k - $191.9k
...guardrails to make going fast also going safely. Identifying reliability anti-patterns and solving them systemically . You dive deep into... ...of AI‑assisted developer tools and platforms to increase engineering productivity, enforce code quality standards, and enable real...Local areaFlexible hours- ...technologies. Our mission is to double America’s compute capacity without building new data centers. We are seeking a skilled Site Reliability Engineer to join our growing team. The ideal candidate will help ensure the reliability, scalability, and performance of our hybrid...Work at officeWeekend work
Do you want to receive more vacancies?
Subscribe and receive similar vacancies to Senior Site Reliability Engineer. Be the first to apply!
- site reliability engineer sre Palo Alto, CA
- site reliability engineer Palo Alto, CA
- senior manager quality engineering Palo Alto, CA
- senior software test automation engineer Palo Alto, CA
- senior design verification engineer Palo Alto, CA
- senior director quality Palo Alto, CA
- senior director of development Palo Alto, CA
- consultant senior consultant Palo Alto, CA
- senior director clinical development Palo Alto, CA
- senior cloud solutions architect Palo Alto, CA


