Senior Site Reliability Engineer

Blitzy

About this position Blitzy is a Cambridge, MA based AI software development platform on a mission to revolutionize the software development life cycle by autonomously building custom software to unlock the next industrial revolution. We're transforming how enterprises build software, turning enterprise requirements into production-ready code with an agentic software development platform that can autonomously execute 80% of the quantum of software development work. We're backed by multiple tier 1 investors, and have proven success as founders of previous start-ups. Location: Cambridge, MA — Kendall Square HQ (In-Office) The Role As a Senior Site Reliability Engineer at Blitzy's Kendall Square headquarters, you will be a foundational force behind the reliability, scalability, and operational excellence of our AI-powered software development platform. Sitting at the intersection of software engineering and infrastructure, you'll ensure that the systems enabling enterprise customers to autonomously build production-ready software remain performant, resilient, and always available. This is a high-ownership, high-impact role for an engineer who operates with urgency, thinks in systems, and takes pride in building infrastructure that doesn't break. What Success Looks Like Blitzy's platform maintains industry-leading uptime — incidents are rare, and when they occur, they are resolved quickly with clear root cause analysis and lasting fixes. SLOs and error budgets are defined for every critical service and actively used to drive engineering decisions, not just tracked passively. Observability is a first-class capability — engineers across the company have the dashboards, traces, and alerts they need to understand system behavior without asking SRE. Deployment pipelines are fast, safe, and reliable — releases go out with confidence and rollbacks are automated when something goes wrong. Infrastructure is entirely codified — no manual provisioning, no configuration drift, every environment reproducible from source. Engineering teams are more productive because of your work — platform friction is low, developer tooling is sharp, and SRE is seen as an accelerant, not a gatekeeper. You are a trusted technical leader at HQ, influencing how Blitzy thinks about reliability as we scale our platform and our team. Areas of Ownership Design, build, and operate highly available, fault-tolerant infrastructure across cloud environments supporting Blitzy's AI platform and enterprise customers. Define and own SLOs, SLAs, and error budgets for critical services; lead blameless postmortems and drive systemic improvements that prevent recurrence. Build and maintain robust CI/CD pipelines, release automation, and deployment infrastructure that empower engineers to ship with speed and safety. Own the full observability stack — logging, metrics, distributed tracing, and alerting (e.g., Prometheus, Grafana, Datadog, OpenTelemetry). Manage Kubernetes clusters and container infrastructure supporting AI agent workloads and production application services. Drive infrastructure-as-code practices using Terraform; ensure all provisioning is automated, auditable, and version-controlled. Partner with engineering teams at HQ to embed reliability and operational best practices early in the development lifecycle. Lead capacity planning, performance benchmarking, and cloud cost optimization as the platform scales. Required Experience 5–8 years of experience in Site Reliability Engineering, DevOps, or Platform Engineering. Deep expertise in Kubernetes — cluster management, workload deployment, scaling strategies, and troubleshooting in production. Strong proficiency with at least one major cloud platform (AWS preferred); experience designing and operating distributed, high-availability systems. Hands-on Terraform experience for infrastructure-as-code provisioning and management. Proven ability to define and operationalize SLOs, SLAs, and incident response processes. Strong scripting and automation skills in Python, Go, or Bash. Experience designing and maintaining comprehensive observability systems across complex, multi-service environments. Excellent cross-functional communication skills — able to partner with software engineers, product teams, and leadership equally well. What Makes You Stand Out Experience operating infrastructure for AI or ML workloads, including GPU scheduling or model serving infrastructure. Familiarity with MLOps tooling (MLflow, Kubeflow, or similar) and the operational challenges unique to AI-driven services. Knowledge of service mesh technologies (Istio, Linkerd) and advanced networking patterns. CKA (Certified Kubernetes Administrator) certification or equivalent demonstrated expertise. Prior experience at a high-growth startup where you built reliability foundations from the ground up. A track record of influencing engineering culture — not just fixing infrastructure, but raising the bar for how teams think about reliability. What Makes This Role Different Most SRE roles have you defending the status quo. At Blitzy, you're building reliability infrastructure for a platform that is actively rewriting how enterprises create software — there is no playbook, and that's the point. You'll be based at our Kendall Square headquarters, working daily alongside our co-founders and core engineering team, with direct influence over how we architect and operate systems at the frontier of AI. You'll receive meaningful equity, giving you real ownership in a company that is defining a new category. If you want to do the most consequential infrastructure work of your career, this is the role. Our Culture Who we are: Led by two pioneering co-founders we are one of the fastest growing companies in the U.S., creating our own category of enterprise autonomous software development. We automate thousands of hours of software development for our customers, which includes strong representation within the Fortune 500. How we work: We move Blitzy Fast: Time is both our company's and our clients' most precious asset. We move quickly and decisively to innovate internally and deliver exceptional software externally. Championship Mindset: We operate like a professional sports team. We win as a team by holding ourselves and each other to high standards, collaborating in-person, and remaining focused on the mission. Passion for Invention: We're pushing the frontier of what's possible, requiring constant innovation and iteration. We Work for the Customer: We focus on delivering outsized value to the customers we work with and expanding those relationships into deep, meaningful partnerships. We believe in being 'everyday athletes'—taking care of ourselves so we can bring our best minds to work. We promote great sleep, movement, and restorative activities for optimal mental performance. It makes for a happier and more productive team. Blitzy is an equal opportunity employer committed to building a diverse and inclusive team. We believe different perspectives make us stronger. #J-18808-Ljbffr Blitzy

Apply

Vacancy posted 3 days ago

Similar jobs that could be interesting for youBased on the Senior Site Reliability Engineer in Cambridge, MA vacancy

Senior Site Reliability Engineer
$140k - $210.9k
...position will be primarily on-site with residency commutable to... ...DevOps backgrounds or software engineering backgrounds (e.g., Java... ...interest in operating and improving reliability of distributed production systems. Responsibilities As a Senior Engineer of the SRE /...
Senior
Full time
Temporary work
Part time
Work at office
Shift work
Dormont Manufacturing Company
Boston, MA
22 hours ago
Senior Site Reliability Engineer
$150k - $185k
...tools and frameworks that work best. You enjoy building for other engineers equally, if not more, than building for a customer. You know... ...evangelize on observability best practices, SLIs/SLOs, and reliability culture across engineering teams. Help architect our systems for...
Senior
Temporary work
Work at office
Flexible hours
3 days per week
Dormont Manufacturing Company
Somerville, MA
1 day ago
Senior Site Reliability Engineer
$104.9k - $174.7k
...immediately hire a highly skilled and proactive Senior SRE to join our dynamic team. You will... ...fault-tolerant systems within agreed reliability objectives, whilst enabling the fast... ...skills. About team; This diverse team of Engineers in assisting multiple product teams as...
Senior
Local area
Immediate start
Worldwide
RELX
Cambridge, MA
3 days ago
Senior Site Reliability Engineer
$121.4k - $218.6k
...will be responsible for ensuring best-in-class uptime and reliability of our AI hardware infrastructure offerings. Partner... ...and defend them when they are breached. As a Senior Site Reliability Engineer, you will be responsible for: Developing and scaling...
Senior
Work experience placement
Work at office
Akamai
Cambridge, MA
1 day ago
Senior Site Reliability Engineer — Real-Time Payments (On-Site)
$140k - $210.9k
...environments, strong communication, and a background in infrastructure or software engineering. Successful candidates will be responsible for producing CI/CD automation and ensuring reliability in distributed systems. A salary range of $140,000 - $210,900 is offered for...
Senior
Federal Reserve Bank of New York
Boston, MA
22 hours ago
Senior Site Reliability Engineer - Remote & CI/CD Focus
Akamai Technologies GmbH is looking for a Senior Site Reliability Engineer in Cambridge, MA. This role involves designing and operating critical services that support the reliability and performance of Akamai Cloud infrastructure. Ideal candidates should have at least...
Senior
Remote job
Akamai Technologies GmbH
Cambridge, MA
22 hours ago
Senior Site Reliability Engineer
We are seeking a talented and experienced Site Reliability Engineer (SRE) to join our dynamic team. As an SRE, you will be responsible for the reliability, performance, and scalability of our services and infrastructure. You will work closely with development teams, operations...
Senior
ValueBase Consulting
Boston, MA
1 day ago
Senior Site Reliability Engineer I Boston, Massachusetts, United States Boston, Massachusetts
$134.25k - $214.8k
...you matter. Your Impact As a Senior SRE on the APX SRE CloudOps... ...platforms that Axon's product engineering teams depend on. You will architect... ...experience to drive reliability improvements and inform platform... ...engineering, cloud infrastructure, or site reliability engineering....
Senior
Work experience placement
Work at office
Remote work
Axon Enterprise
Boston, MA
22 hours ago
Senior Site Reliability Engineer - FedNow Platform
$140k - $210.9k
The Federal Reserve Bank of Boston seeks a Senior Site Reliability Engineer to enhance their payments service, FedNow. This position involves significant responsibilities in operating the production environment and the design of automation and monitoring tools to ensure...
Senior
Full time
Dormont Manufacturing Co
Boston, MA
2 days ago
Site Reliability Engineer (Senior or Staff), Atlas
$127k - $249k
...remote from Eastern or Central time zones. The role supports the Atlas platform as part of the Senior SRE Atlas team. Role Overview Seeking a senior Site Reliability Engineer to design, build, and operate complex systems that support the Atlas platform. The role emphasizes...
Senior
Local area
Remote work
Flexible hours
The Consulting Solutions
Boston, MA
2 days ago
Site Reliability Engineer (Senior or Staff)
$127k
Position Overview Platform Engineering is the department within SRE that is responsible for a range of critical infrastructure and operational... ...maintains our continuous delivery infrastructure, ensuring reliable code deployment from development through production for all...
Senior
Local area
Flexible hours
The Consulting Solutions
Boston, MA
2 days ago
Senior Site Reliability Engineer — Cloud Storage Layer
The Consulting Solutions is looking for a Senior Site Reliability Engineer to enhance MongoDB’s cloud storage layer. You will work on distributed storage services to support customer workloads efficiently. This role can be remote or based in Raleigh, NC. Ideal candidates...
Senior
Remote work
Flexible hours
The Consulting Solutions
Boston, MA
2 days ago
Senior/Staff Site Reliability Engineer - Data Center
$165.75k - $224.45k
...are dedicated to solving complex problems and making a huge impact. Where You Fit We're looking for a skilled staff level Site Reliability Engineer focused on designing, building, and operating our hybrid cloud/on-prem environment. What You’ll Do If you\'re the right...
Senior
Hourly pay
PathAI
Boston, MA
3 days ago
Senior Site Reliability Engineer I
$134.25k - $214.8k
...upload. Every piece of digital evidence. Every chain of custody log that holds up in court. That's us. Axon's Platform team is the engine behind what hundreds of thousands of officers rely on every day. We're one of the world's largest blob storage customers, ingesting...
Senior
Full time
Work experience placement
Work at office
Remote work
Axon
Boston, MA
1 day ago
Site Reliability Engineer (Senior or Staff), Storage Layer Services (SLS)
$126k - $248k
..., shape capacity plans, and ensure the reliability, durability, and operational safety of... ...underpins Atlas. You’ll join a small, senior team of SREs as founding members of this... ...strategic infrastructure goals with immediate engineering needs. Build for reliability, making...
Senior
Local area
Immediate start
Remote work
Flexible hours
Shift work
The Consulting Solutions
Boston, MA
2 days ago
Senior Application Support Engineer / Site Reliability Engineer (SRE)
...Information Technology group delivers secure, reliable technology solutions that enable... ...You Will Have in This Role As a Senior Application Support Engineer, you will help power DTCC's global... ...and settlement. Leveraging Site Reliability Engineering (SRE) principles...
Senior
Full time
Remote work
Flexible hours
DTCC
Boston, MA
2 days ago
Senior Site Reliability Engineer
$125.04k - $187.56k
...services, including Finance, Legal, Sustainability, Commercial, Digital and E-commerce, Technology and more. Overview The Site Reliability Engineer (SRE) III is responsible for ensuring the scalability, reliability, and performance of production systems through automation...
Senior
Full time
Work at office
Remote work
Flexible hours
ViziRecruiter
Quincy, MA
2 days ago
Senior Site Reliability Engineer
...Senior Site Reliability Engineer - Waltham, MA Dentsply Sirona is the world’s largest manufacturer of professional dental products and technologies, with a 130-year history of innovation and service to the dental industry and patients worldwide. Dentsply Sirona develops...
Senior
Work at office
Immediate start
Remote work
Worldwide
Wellspect HealthCare
Waltham, MA
2 days ago
Senior Site Reliability Engineer: Cloud-Native Automation
A global food retailer is seeking a Site Reliability Engineer III to ensure system reliability, scalability, and performance in their cloud-native environment. Responsibilities include designing infrastructure solutions and mentoring junior engineers, while requirements...
Senior
ViziRecruiter,LLC.
Quincy, MA
22 hours ago
Site Reliability Engineer II
$95k - $171k
...tasks. Opportunities exist to focus on GPU infrastructure, Kubernetes, and ensuring reliability for AI workloads within Akamai's serverless inference platform. As an Site Reliability Engineer II, you will be responsible for: Building and maintaining dashboards, alerts, and...
Work experience placement
Akamai
Boston, MA
2 days ago
Site Reliability Engineer
$75.7k - $136.3k
...solve complex challenges? Do you have a passion for automation and building systems that scale? Join our highly skilled Site Reliability Engineering team! Our team designs, develops, and manages applications and infrastructure that support Akamai Cloud's products and services...
Work experience placement
Work at office
Akamai
Boston, MA
3 days ago
Senior Associate, AI Engineer
$87.97k - $188.95k
...consider a career in Advisory. KPMG is currently seeking a Senior Associate, AI Engineer to join our Advisory Services practice.... ...benefits can be found towards the bottom of our KPMG US Careers site at Benefits & How We Work . Follow this link to obtain...
Senior
Full time
H1b
Local area
KPMG
Boston, MA
4 days ago
Director, Site Reliability Engineering
$121.5k - $306.4k
...and provides input on best practices for reliability and functionality. Establishes direction... ...technology, executing improvements, building site reliability knowledge, and providing... ...Establishes direction for other managers and senior-level individuals to drive the...
Temporary work
Flexible hours
Oracle
Boston, MA
2 days ago
Senior Mission Critical Software Engineer
...Description Summary: Draper is looking for a highly skilled senior software engineer to join the Mission Software Architectures and Applications... ...-generation resilient, fault tolerant, real-time, highly reliable software solutions in the application domains of undersea,...
Senior
Draper
Cambridge, MA
18 days ago
Senior Public Markets Investment Principal
...to join their Public Markets Team in New York. The role requires a strong understanding of public markets and involves supporting senior investment professionals with research activities. Key responsibilities include conducting due diligence, creating investment documentation...
Senior
Partners Capital
Boston, MA
1 day ago
Senior Software Engineer
...collaborating with Verisk to connect them with exceptional professionals for this role. Description We are hiring a Senior Software Engineer with deep expertise in AI/ML engineering and data-intensive systems to join our Catastrophic and Risk Solutions team. You...
Senior
Work at office
Flexible hours
Verisk
Boston, MA
9 days ago
Senior ServiceNow Platform Engineer
...BioSpace is looking for a ServiceNow Development Engineer based in Cambridge, Massachusetts. This role requires strong technical expertise to develop and support the ServiceNow platform, driving automation and optimizing business processes. The ideal candidate will have...
Senior
BioSpace, Inc.
Cambridge, MA
22 hours ago
Senior Principal Scientist, GMP Radiopharmaceuticals
A leading healthcare institution in Boston seeks a Sr. Principal Scientist to oversee operations in a GMP lab, focusing on the production and quality control of radiopharmaceuticals. The ideal candidate will have extensive experience in aseptic operations, knowledge of...
Senior
The University of Texas MD Anderson Cancer Center
Boston, MA
1 day ago
Senior Embedded Software Engineer
...Job Description Job Description Job Description Summary: The Software Engineer (SMTS) develops high performance solutions for resource constrained targets. Develops next-generation resilient and fault tolerant software solutions in the application domains of undersea...
Senior
Draper
Cambridge, MA
10 days ago
Senior principal software Engineer
...without exposing credentials or opening inbound network access. We are searching for a highly motivated, visionary Senior Principal Software Engineer to join the SIA RDP team. In this role, you will be a technical cornerstone of a highly talented, cross-functional...
Senior
Work experience placement
Remote work
Palo Alto Networks
Boston, MA
1 day ago

Do you want to receive more vacancies?

Subscribe and receive similar vacancies to Senior Site Reliability Engineer. Be the first to apply!