Sign up to access all features of our service.
  • Job search
  • Favorites
  • Create a CV
    New
  • Salaries
  • Subscriptions

Senior Site Reliability Engineer

Blitzy

About this position Blitzy is a Cambridge, MA based AI software development platform on a mission to revolutionize the software development life cycle by autonomously building custom software to unlock the next industrial revolution. We're transforming how enterprises build software, turning enterprise requirements into production-ready code with an agentic software development platform that can autonomously execute 80% of the quantum of software development work. We're backed by multiple tier 1 investors, and have proven success as founders of previous start-ups. Location: Cambridge, MA — Kendall Square HQ (In-Office) The Role As a Senior Site Reliability Engineer at Blitzy's Kendall Square headquarters, you will be a foundational force behind the reliability, scalability, and operational excellence of our AI-powered software development platform. Sitting at the intersection of software engineering and infrastructure, you'll ensure that the systems enabling enterprise customers to autonomously build production-ready software remain performant, resilient, and always available. This is a high-ownership, high-impact role for an engineer who operates with urgency, thinks in systems, and takes pride in building infrastructure that doesn't break. What Success Looks Like Blitzy's platform maintains industry-leading uptime — incidents are rare, and when they occur, they are resolved quickly with clear root cause analysis and lasting fixes. SLOs and error budgets are defined for every critical service and actively used to drive engineering decisions, not just tracked passively. Observability is a first-class capability — engineers across the company have the dashboards, traces, and alerts they need to understand system behavior without asking SRE. Deployment pipelines are fast, safe, and reliable — releases go out with confidence and rollbacks are automated when something goes wrong. Infrastructure is entirely codified — no manual provisioning, no configuration drift, every environment reproducible from source. Engineering teams are more productive because of your work — platform friction is low, developer tooling is sharp, and SRE is seen as an accelerant, not a gatekeeper. You are a trusted technical leader at HQ, influencing how Blitzy thinks about reliability as we scale our platform and our team. Areas of Ownership Design, build, and operate highly available, fault-tolerant infrastructure across cloud environments supporting Blitzy's AI platform and enterprise customers. Define and own SLOs, SLAs, and error budgets for critical services; lead blameless postmortems and drive systemic improvements that prevent recurrence. Build and maintain robust CI/CD pipelines, release automation, and deployment infrastructure that empower engineers to ship with speed and safety. Own the full observability stack — logging, metrics, distributed tracing, and alerting (e.g., Prometheus, Grafana, Datadog, OpenTelemetry). Manage Kubernetes clusters and container infrastructure supporting AI agent workloads and production application services. Drive infrastructure-as-code practices using Terraform; ensure all provisioning is automated, auditable, and version-controlled. Partner with engineering teams at HQ to embed reliability and operational best practices early in the development lifecycle. Lead capacity planning, performance benchmarking, and cloud cost optimization as the platform scales. Required Experience 5–8 years of experience in Site Reliability Engineering, DevOps, or Platform Engineering. Deep expertise in Kubernetes — cluster management, workload deployment, scaling strategies, and troubleshooting in production. Strong proficiency with at least one major cloud platform (AWS preferred); experience designing and operating distributed, high-availability systems. Hands-on Terraform experience for infrastructure-as-code provisioning and management. Proven ability to define and operationalize SLOs, SLAs, and incident response processes. Strong scripting and automation skills in Python, Go, or Bash. Experience designing and maintaining comprehensive observability systems across complex, multi-service environments. Excellent cross-functional communication skills — able to partner with software engineers, product teams, and leadership equally well. What Makes You Stand Out Experience operating infrastructure for AI or ML workloads, including GPU scheduling or model serving infrastructure. Familiarity with MLOps tooling (MLflow, Kubeflow, or similar) and the operational challenges unique to AI-driven services. Knowledge of service mesh technologies (Istio, Linkerd) and advanced networking patterns. CKA (Certified Kubernetes Administrator) certification or equivalent demonstrated expertise. Prior experience at a high-growth startup where you built reliability foundations from the ground up. A track record of influencing engineering culture — not just fixing infrastructure, but raising the bar for how teams think about reliability. What Makes This Role Different Most SRE roles have you defending the status quo. At Blitzy, you're building reliability infrastructure for a platform that is actively rewriting how enterprises create software — there is no playbook, and that's the point. You'll be based at our Kendall Square headquarters, working daily alongside our co-founders and core engineering team, with direct influence over how we architect and operate systems at the frontier of AI. You'll receive meaningful equity, giving you real ownership in a company that is defining a new category. If you want to do the most consequential infrastructure work of your career, this is the role. Our Culture Who we are: Led by two pioneering co-founders we are one of the fastest growing companies in the U.S., creating our own category of enterprise autonomous software development. We automate thousands of hours of software development for our customers, which includes strong representation within the Fortune 500. How we work: We move Blitzy Fast: Time is both our company's and our clients' most precious asset. We move quickly and decisively to innovate internally and deliver exceptional software externally. Championship Mindset: We operate like a professional sports team. We win as a team by holding ourselves and each other to high standards, collaborating in-person, and remaining focused on the mission. Passion for Invention: We're pushing the frontier of what's possible, requiring constant innovation and iteration. We Work for the Customer: We focus on delivering outsized value to the customers we work with and expanding those relationships into deep, meaningful partnerships. We believe in being 'everyday athletes'—taking care of ourselves so we can bring our best minds to work. We promote great sleep, movement, and restorative activities for optimal mental performance. It makes for a happier and more productive team. Blitzy is an equal opportunity employer committed to building a diverse and inclusive team. We believe different perspectives make us stronger. #J-18808-Ljbffr Blitzy

Vacancy posted 2 days ago
Similar jobs that could be interesting for youBased on the Senior Site Reliability Engineer in Cambridge, MA vacancy
  • $140k - $210.9k

     ...environments, strong communication, and a background in infrastructure or software engineering. Successful candidates will be responsible for producing CI/CD automation and ensuring reliability in distributed systems. A salary range of $140,000 - $210,900 is offered for... 
    Senior

    Federal Reserve Bank of New York

    Boston, MA
    4 days ago
  • A modern observability platform located in the Boston area is seeking a skilled Site Reliability Engineer to join their Cloud Infrastructure Team. This role involves managing high-scale environments, collaborating with R&D to improve system stability, and performing operational... 
    Senior

    Coralogix, inc.

    Boston, MA
    4 days ago
  • Akamai Technologies GmbH is looking for a Senior Site Reliability Engineer in Cambridge, MA. This role involves designing and operating critical services that support the reliability and performance of Akamai Cloud infrastructure. Ideal candidates should have at least... 
    Senior
    Remote job

    Akamai Technologies GmbH

    Cambridge, MA
    4 days ago
  • An innovative technology firm in Boston is seeking a Site Reliability Engineer to join their Cloud Infrastructure Team. This role involves working in high-scale environments, handling significant data processing and ensuring robust operation of FedRAMP cloud products. The... 
    Senior

    Coralogix, inc.

    Boston, MA
    21 hours ago
  • $160k - $195k

     ...federal, state and local agencies fuels the RapidSOS HARMONY AI engine that delivers this intelligence to those who need it most....  ...What this role is about Are you excited to work on systems where reliability directly impacts real‑world outcomes? At RapidSOS, we build... 
    Senior
    Local area
    Flexible hours

    RapidSOS

    Boston, MA
    3 days ago
  • $140k - $210.9k

     ...States. The position will be primarily on‑site with residency commutable to one of our...  .../DevOps backgrounds or software engineering backgrounds (e.g., Java, Python, Go) with...  ...strong interest in operating and improving reliability of distributed production systems. Responsibilities... 
    Senior

    Federal Reserve Bank of New York

    Boston, MA
    4 days ago
  • $134.25k - $214.8k

     ...matters at a company where you matter. Your Impact Are you an engineer who gets excited about the challenge of making complex...  ...it. You will be part of the Observability team within Axon’s Site Reliability organization - a focused team responsible for Axon’s metrics,... 
    Senior
    Work at office
    Remote work

    Koitecc Solutions

    Boston, MA
    4 days ago
  • $127k - $249k

    The Team Platform Engineering is the department within SRE that is responsible for a range of critical infrastructure and operational...  ...fleet, alongside the critical components that ensure cluster reliability and security (e.g., CoreDNS, cert-manager, and Gatekeeper). As... 
    Senior
    Work at office
    Local area
    Remote work
    Worldwide
    Flexible hours

    MongoDB

    Boston, MA
    21 hours ago
  • $134.25k - $214.8k

     ...you matter. Your Impact As a Senior SRE on the APX SRE CloudOps...  ...platforms that Axon's product engineering teams depend on. You will architect...  ...experience to drive reliability improvements and inform platform...  ...engineering, cloud infrastructure, or site reliability engineering.... 
    Senior
    Work experience placement
    Work at office
    Remote work

    Axon Enterprise

    Boston, MA
    4 days ago
  • $165.75k - $224.45k

     ...are dedicated to solving complex problems and making a huge impact. Where You Fit We're looking for a skilled staff level Site Reliability Engineer focused on designing, building, and operating our hybrid cloud/on-prem environment. What You’ll Do If you\'re the right... 
    Senior
    Hourly pay

    PathAI

    Boston, MA
    2 days ago
  • $180k - $225k

    Your Impact You are a Sr. Site Reliability Engineer II who will help define how Axon builds and operates its core platforms, with a primary focus...  ...cross‑functional. You will collaborate with staff and senior engineers across product and platform teams to shape how we... 
    Senior
    Work at office
    Immediate start
    Remote work

    Koitecc Solutions

    Boston, MA
    2 days ago
  • $125.04k - $187.56k

     ...services, including Finance, Legal, Sustainability, Commercial, Digital and E-commerce, Technology and more. Overview The Site Reliability Engineer (SRE) III is responsible for ensuring the scalability, reliability, and performance of production systems through... 
    Senior
    Full time
    Work at office
    Remote work
    Flexible hours

    ViziRecruiter,LLC.

    Quincy, MA
    4 days ago
  • A global food retailer is seeking a Site Reliability Engineer III to ensure system reliability, scalability, and performance in their cloud-native environment. Responsibilities include designing infrastructure solutions and mentoring junior engineers, while requirements... 
    Senior

    ViziRecruiter,LLC.

    Quincy, MA
    4 days ago
  •  ...improve software solutions to ensure system reliability and availability, mitigate operational...  ...issues. # You will help lead chaos engineering efforts in a production-alike environment...  ...professionals, with engineers focused on site reliability engineering and... 
    Senior
    Permanent employment
    Flexible hours

    Teradata

    Boston, MA
    15 days ago
  •  ...Software Engineer, Front End The Software Engineer, Front End will build modular web applications that are easy to use and fully tested...  ...: Implement user interfaces that are highly intuitive, reliable, and meet the needs of our customers Contribute to component... 
    Senior

    Roberts Recruiting

    Boston, MA
    3 days ago
  •  ...DevOps/Site Reliability Engineer We are hiring DevOps/Site Reliability Engineers to innovate upon the way we deploy, test, and develop our industry-leading marketing and analytics software. Engineers here solve problems in distributed computing, infrastructure automation... 

    Roberts Recruiting

    Boston, MA
    3 days ago
  •  ...Software Engineer, Back End We are a company dedicated to harnessing nature to help farmers sustainably feed the planet. With a vision...  ...Engineer) and their API needs Deep commitment to quality, reliability, scalability and maintainability Works and interacts well... 
    Senior

    Roberts Recruiting

    Boston, MA
    3 days ago
  • $95k - $171k

     ...tasks. Opportunities exist to focus on GPU infrastructure, Kubernetes, and ensuring reliability for AI workloads within Akamai's serverless inference platform. As an Site Reliability Engineer II, you will be responsible for: Building and maintaining dashboards, alerts,... 
    Permanent employment
    Work experience placement
    Work at office
    Work from home
    Worldwide
    Flexible hours

    Akamai Technologies

    Cambridge, MA
    1 day ago
  • $95k - $171k

    A leading cloud computing company seeks a Site Reliability Engineer II to join their Inference Cloud Team. The role involves building dashboards, writing automation in Python or Go, and collaborating with engineering teams to ensure AI infrastructure reliability. Candidates... 
    Flexible hours

    Akamai Technologies

    Cambridge, MA
    2 days ago
  • $75.7k - $136.3k

     ...solve complex challenges? Do you have a passion for automation and building systems that scale? Join our highly skilled Site Reliability Engineering team! Our team designs, develops, and manages applications and infrastructure that support Akamai Cloud's products and... 
    Work experience placement
    Work at office
    Remote work

    Akamai Technologies GmbH

    Cambridge, MA
    21 hours ago
  •  ...Software Development Engineer We're creating a platform that will change the way organizations measure their software development efforts...  ...teams can work and the tools they use. Location: on-site in Boston. We believe that it takes a diverse team to build the... 
    Senior

    Roberts Recruiting

    Boston, MA
    a month ago
  • $95k - $171k

     ...technologies and tackling system problems? Join our highly skilled Site Reliability team Our team designs, develops, and manages applications...  ...environment Collaborating with our support, operations and engineering teams to investigate and troubleshoot complex problems... 
    Work experience placement
    Work at office
    Remote work

    Akamai Technologies GmbH

    Cambridge, MA
    3 days ago
  • ## Site Reliability EngineerBoston, MA · Full-time · Senior#### About The PositionCoralogix is a modern, full-stack observability platform transforming how businesses...  ...up to 70%.We are looking for a Site Reliability Engineer to work as part of our Cloud Infrastructure Team.... 
    Full time

    Coralogix, inc.

    Boston, MA
    4 days ago
  • $137k - $170.9k

     ...coordinate support and resolve platform issues across CPU/radio SoCs, MCU/PIC, NPU/GPU, and peripheral devices. Support hardware engineering teams with deep technical debugging and contribute to OS/platform modernization efforts. What You'll Need Basic... 
    Senior
    Full time
    Temporary work
    Work at office
    Immediate start
    Visa sponsorship
    Work visa

    Sonos

    Boston, MA
    1 day ago
  • DevOps / Site Reliability Engineer ID70127 Full time | AgileEngine | United States Posted On 06/10/2026 Job Information City Boston State/Province...  ...Place to Work awards. ABOUT THE ROLE We are looking for a Senior Site Reliability Engineer to maintain operational... 
    Full time
    Work at office
    Remote work
    Visa sponsorship
    Work visa
    Flexible hours

    AgileEngine, LLC.

    Boston, MA
    21 hours ago
  • $129k - $171k

    Anduril Industries in Boston is seeking a Solutions Engineer to design and implement business capabilities for the people systems. You will collaborate closely with engineering, supply chain, and finance teams to align systems with the company’s goals. The ideal candidate... 
    Senior

    Slope

    Boston, MA
    4 days ago
  •  ...Software Engineer Opportunity We're looking for talented software engineers to join our rapidly growing team in Boston! Be a part of a company poised to dominate an untapped segment of the construction industry! We built a cloud-based construction logistics technology... 
    Senior
    Casual work
    Flexible hours

    Roberts Recruiting

    Boston, MA
    3 days ago
  • Red Hat, Inc. is seeking a Senior Partner Solutions Engineer in Boston, MA, to enhance the interoperability of partner solutions across critical technology platforms. You will leverage technical expertise to solve complex challenges and contribute to product validation... 
    Senior

    Red Hat, Inc.

    Boston, MA
    1 day ago
  •  ...Senior Engineer Our mission is to radically shift the global economy toward small businesses by empowering people to easily start, confidently grow and successfully run their own ventures. With over 13 million customers worldwide and more than 61 million domain names... 
    Senior
    Worldwide
    Shift work

    Roberts Recruiting

    Cambridge, MA
    3 days ago
  •  ...Job Title: Generative AI Engineer (Senior / Lead / Principal)- Multiple openings Experience Level: 8+ to 13+ Years Location: Hybrid - Remote (India-based) with onsite every Thursday in Chennai Industry: AI/ML, Enterprise Applications, Healthcare... 
    Senior
    Work at office
    Remote work

    Saviance

    Boston, MA
    2 days ago

Do you want to receive more vacancies?

Subscribe and receive similar vacancies to Senior Site Reliability Engineer. Be the first to apply!