Principal Site Reliability Engineer

$146.96k - $220.44k

ViziRecruiter,LLC.

Introduction Ahold Delhaize USA, a division of global food retailer Ahold Delhaize, is part of the U.S. family of brands, which also includes five leading omnichannel grocery brands – Food Lion, Giant Food, The GIANT Company, Hannaford and Stop & Shop. Ahold Delhaize USA associates support the brands with a wide range of services, including Finance, Legal, Sustainability, Commercial, Digital and E-commerce, Technology and more. Overview The Site Reliability Engineer (SRE) IV is a senior technical leader responsible for designing, guiding, and scaling site reliability engineering practices across complex, distributed systems. This role plays a crucial part in driving operational excellence, ensuring system resiliency, and fostering a high-performing engineering culture. The SRE IV works closely with senior leadership, engineering, and product teams to set strategic goals around availability, performance, and incident response while leading large‑scale reliability initiatives. This position emphasizes deep technical expertise in platforms such as Spring Boot, Java, Tomcat, Redis, and Kafka, along with infrastructure tooling like AKS, Kubernetes, ArgoCD, Terraform, GitHub Actions, and observability platforms like Datadog. The ideal candidate will also bring strong experience working with Ubuntu/Linux environments, containerization with Docker, and automation of operational workflows across a modern DevOps toolchain. Our flexible/hybrid work schedule includes 3 in-person days at one of our Chicago, IL office and 2 remote days. Applicants must be currently authorized to work in the United States on a full-time basis. Responsibilities Architect, evolve, and lead implementation of enterprise-level SRE frameworks, tools, and cloud-native reliability strategies. Build, scale, and manage microservices platforms using Spring Boot, Java, Tomcat, and Redis with Kubernetes and AKS. Lead technical design reviews, chaos testing, and infrastructure planning with an emphasis on scalability, high availability, and fault tolerance. Define, implement, and refine SLOs/SLIs and operational health indicators for business-critical services. Automate infrastructure provisioning and application deployment workflows using Terraform, GitHub Actions, and ArgoCD. Drive observability and telemetry adoption using Datadog, including dashboards, alerts, custom metrics, and distributed tracing. Act as incident commander during critical production issues; conduct blameless postmortems and guide root cause remediation. Lead cross-team efforts in reducing mean time to detect (MTTD) and resolve (MTTR), and promoting self-healing systems. Partner with security and compliance teams to ensure that systems are secure, auditable, and operationally compliant. Enhance service resiliency through strategies including Kafka-based event‑driven architecture, retries, rate limiting, and circuit breakers. Mentor junior SREs and engineers, lead technical communities of practice, and promote a culture of continuous improvement. Maintain and improve Ubuntu-based production systems and containerized workloads with Docker. Evaluate and integrate emerging DevOps technologies to support scalability and reliability objectives. Requirements Bachelor's or Master's degree in Computer Science, Engineering, or a related technical field; equivalent practical experience may be considered. 8+ years of experience in Site Reliability Engineering, DevOps, or Platform Engineering roles in large‑scale production environments. Expertise in building and maintaining Java-based microservices using Spring Boot, Tomcat, and Redis in containerized deployments. Strong hands‑on experience with Kubernetes, AKS, and ArgoCD for orchestration and GitOps deployment workflows. Proficiency in Python, Java, Bash, or Go for automation, scripting, and infrastructure tooling. Proven ability to implement observability platforms and practices using Datadog (metrics, logs, traces, dashboards, alerts). Advanced experience working with CI/CD pipelines using GitHub and GitHub Actions. Deep understanding of networking, Linux (especially Ubuntu), distributed systems, and container security. Experience operating message‑driven architectures using Kafka, with an emphasis on throughput, retries, and resilience. Solid knowledge of Terraform and infrastructure as code best practices. Excellent communication, collaboration, and stakeholder alignment skills across engineering and business teams. Salary Range: $146,960 ‑ $220,440 Actual compensation offered to a candidate may vary based on their unique qualifications and experience, internal equity, and market conditions. Final compensation decisions will be made in accordance with company policies and applicable laws. #J-18808-Ljbffr ViziRecruiter,LLC.

Apply

Vacancy posted 4 days ago

Similar jobs that could be interesting for youBased on the Principal Site Reliability Engineer in Quincy, MA vacancy

Senior Site Reliability Engineer
$140k - $210.9k
...States. The position will be primarily on-site with residency commutable to one of our... .../DevOps backgrounds or software engineering backgrounds (e.g., Java Python, Go) with... ...strong interest in operating and improving reliability of distributed production systems. Responsibilities...
Suggested
Full time
Temporary work
Part time
Work at office
Shift work
Federal Reserve System
Boston, MA
7 days ago
Senior Site Reliability Engineer
$121.4k - $218.6k
...will be responsible for ensuring best-in-class uptime and reliability of our AI hardware infrastructure offerings. **Partner with... ...and defend them when they are breached. As a Senior Site Reliability Engineer, you will be responsible for: + Developing and scaling robust...
Suggested
Work experience placement
Work at office
Akamai
Boston, MA
4 days ago
Sr. Site Reliability Engineer
$60 - $70 per hour
...Job Title: DevOps Automation Engineer (Platform Operations) Job Description This DevOps Automation Engineer role sits within... ...a global, regulated MedTech context. The position emphasizes site reliability engineering and platform operations over CI/CD-heavy...
Suggested
Contract work
Temporary work
Remote work
Actalent
Boston, MA
2 days ago
Senior Site Reliability Engineer, Fleet Management
$127k - $249k
...The Team Platform Engineering is the department within SRE that is responsible for a range of critical infrastructure and operational... ...fleet, alongside the critical components that ensure cluster reliability and security (e.g., CoreDNS, cert-manager, and Gatekeeper)....
Suggested
Work at office
Local area
Remote work
Worldwide
Flexible hours
MongoDB
Boston, MA
7 days ago
Senior Site Reliability Engineer
$81.1k - $187k
...Job Description We are looking for a Site Reliability Engineer 3 to support mission-critical cloud services and production operations. The role focuses on improving service reliability, reducing operational risk, automating repetitive tasks, and driving faster detection...
Suggested
Temporary work
Immediate start
Flexible hours
Shift work
Oracle
Boston, MA
5 days ago
Site Reliability Engineer
$125k - $350k
...Site Reliability Engineer New York, Miami, Gurugram, London, Singapore, Sydney Job Description Opportunities may be available from time to time in any location in which the business is based for suitable candidates. If you are interested in a career with Citadel...
Citadel Securities
Dorchester, MA
1 day ago
Senior Site Reliability Engineer — Real-Time Payments (On-Site)
$140k - $210.9k
...environments, strong communication, and a background in infrastructure or software engineering. Successful candidates will be responsible for producing CI/CD automation and ensuring reliability in distributed systems. A salary range of $140,000 - $210,900 is offered for...
Federal Reserve Bank of New York
Boston, MA
1 day ago
Senior Site Reliability Engineer — Kubernetes & Observability
A modern observability platform located in the Boston area is seeking a skilled Site Reliability Engineer to join their Cloud Infrastructure Team. This role involves managing high-scale environments, collaborating with R&D to improve system stability, and performing operational...
Coralogix, inc.
Boston, MA
1 day ago
Senior Site Reliability Engineer
We are seeking a talented and experienced Site Reliability Engineer (SRE) to join our dynamic team. As an SRE, you will be responsible for the reliability, performance, and scalability of our services and infrastructure. You will work closely with development teams, operations...
ValueBase Consulting
Boston, MA
2 days ago
Principal Software Engineer
...Forward Thinking. It’s how we stay driven, supportive, and always one step ahead as AI reshapes our world. Why this role? As a Principal Software Engineer at Snyk, you'll be a key leader in our Engineering Team, shaping the future of our products and improving how...
Principal
Clutch Canada
Boston, MA
4 days ago
Principal Platform Engineer
$170k - $205k
...the company | Principal Platform Engineer | Boston, MA | ONSITE (hybrid) | $170k–$205k | company.applytojob.com/apply/BHpChxSggM/Principal-Platform-Engineer the company builds FDA-cleared AI software that analyzes cardiac CT scans to characterize plaque and predict heart...
Principal
United States Digital Space LLC
Boston, MA
1 day ago
Sr. Site Reliability Engineer II
$180k - $225k
Your Impact You are a Sr. Site Reliability Engineer II who will help define how Axon builds and operates its core platforms, with a primary focus on Zero Touch, our controlled, compliant execution framework, and the identity and security foundations that sit around it....
Work at office
Immediate start
Remote work
Koitecc Solutions
Boston, MA
4 days ago
Senior Site Reliability Engineer - FedRAMP Security
An innovative technology firm in Boston is seeking a Site Reliability Engineer to join their Cloud Infrastructure Team. This role involves working in high-scale environments, handling significant data processing and ensuring robust operation of FedRAMP cloud products. The...
Coralogix, inc.
Boston, MA
2 days ago
Senior Site Reliability Engineer
$125.04k - $187.56k
...services, including Finance, Legal, Sustainability, Commercial, Digital and E-commerce, Technology and more. Overview The Site Reliability Engineer (SRE) III is responsible for ensuring the scalability, reliability, and performance of production systems through...
Full time
Work at office
Remote work
Flexible hours
ViziRecruiter,LLC.
Quincy, MA
21 days ago
Senior Site Reliability Engineer I Boston, Massachusetts, United States Boston, Massachusetts
$134.25k - $214.8k
...automation platforms that Axon's product engineering teams depend on. You will architect... ..., using operational experience to drive reliability improvements and inform platform investment... ...software engineering, cloud infrastructure, or site reliability engineering. Experience...
Work experience placement
Work at office
Remote work
Axon Enterprise
Boston, MA
1 day ago
Site Reliability Engineer (Senior or Staff), Atlas
$127k - $249k
...Central time zones. The role supports the Atlas platform as part of the Senior SRE Atlas team. Role Overview Seeking a senior Site Reliability Engineer to design, build, and operate complex systems that support the Atlas platform. The role emphasizes autonomy, ownership,...
Local area
Remote work
Flexible hours
The Consulting Solutions
Boston, MA
3 days ago
Senior Principal Software Engineer - Azure OpenShift & AI
$193.39k - $318.98k
Red Hat, Inc. is seeking a Senior Principal Software Engineer to join the Azure Red Hat OpenShift Engineering team in Boston, MA. This high-impact role demands extensive experience in software development, particularly in Linux and Golang, and expertise in Azure cloud architecture...
Principal
Jobleads-US
Boston, MA
14 hours ago
Principal Software Engineer
Job Description Come be part of something new and exciting at a well-established analytical company. Help build scalable solutions for an industry leading catastrophe modeling company. At Catastrophe and Risk Solutions, a Verisk business, you will be part of a team of ...
Principal
Maplecroft
Boston, MA
2 days ago
Site Reliability Engineer (Senior or Staff)
$127k
Position Overview Platform Engineering is the department within SRE that is responsible for a range of critical infrastructure and operational... ...maintains our continuous delivery infrastructure, ensuring reliable code deployment from development through production for all...
Local area
Flexible hours
The Consulting Solutions
Boston, MA
3 days ago
Principal Software Engineer
$225k - $260k
...ClassPass, Capital One, Cisco, and Rippling, just to name a few. Our focus is on building software with care and craftsmanship and our engineering blog posts offer a taste of that. Backed by top investors such as Matrix Partners, Battery Ventures, and Delta-V Capital, we've...
Principal
Visa sponsorship
LogRocket
Boston, MA
2 days ago
Site Reliability Engineer - Disaster Recovery & Business Continuity
$130k - $150k
About Charles River Associates For over 50 years, Charles River Associates has been a premier consulting firm that offers employees a place to learn from a diverse group of consultants, industry experts, and academics. At CRA you will be exposed to leading minds who use...
Work at office
Work from home
3 days per week
Dormont Manufacturing Co
Boston, MA
3 days ago
Site Reliability Engineer I
$53k - $90.4k
Site Reliability Engineer ISkip to main content#Site Reliability Engineer I page is loaded## Site Reliability Engineer IApplylocations: Maitland: Milwaukee: Bostontime type: Full timeposted on: Posted 6 Days Agojob requisition id: R-101247Do you want to shape the future...
Work at office
Local area
Flexible hours
Alegeus
Boston, MA
5 days ago
Site Reliability Engineer
## Site Reliability EngineerBoston, MA · Full-time · Senior#### About The PositionCoralogix is a modern, full-stack observability platform... ...observability spend by up to 70%.We are looking for a Site Reliability Engineer to work as part of our Cloud Infrastructure Team. Focusing on...
Full time
Coralogix, inc.
Boston, MA
1 day ago
Senior Site Reliability Engineer: Cloud-Native Automation
A global food retailer is seeking a Site Reliability Engineer III to ensure system reliability, scalability, and performance in their cloud-native environment. Responsibilities include designing infrastructure solutions and mentoring junior engineers, while requirements...
ViziRecruiter,LLC.
Quincy, MA
1 day ago
Principal Engineer - Autonomous AI Platform
A leading cloud analytics company in Boston seeks a Principal Engineer to lead innovation in Agentic AI. In this role, you will architect and design frameworks for autonomous AI agents, ensure security by design, and collaborate across teams to define the next-generation...
Principal
Flexible hours
Teradata Corporation (SE)
Boston, MA
1 day ago
Senior Principal Scientist, GMP Radiopharmaceuticals
...A leading healthcare institution in Boston seeks a Sr. Principal Scientist to oversee operations in a GMP lab, focusing on the production and quality control of radiopharmaceuticals. The ideal candidate will have extensive experience in aseptic operations, knowledge of...
Principal
The University of Texas MD Anderson Cancer Center
Boston, MA
2 days ago
Senior Site Reliability Engineer - FedNow Platform
$140k - $210.9k
The Federal Reserve Bank of Boston seeks a Senior Site Reliability Engineer to enhance their payments service, FedNow. This position involves significant responsibilities in operating the production environment and the design of automation and monitoring tools to ensure...
Full time
Dormont Manufacturing Co
Boston, MA
3 days ago
Principal Platform Engineer: AWS/DevOps for Regulated SaMD
United States Digital Space LLC in Boston is hiring a Principal Platform Engineer to manage their cloud infrastructure and developer platform. This role involves working extensively with AWS, containers, and CI/CD processes while ensuring compliance with regulations. The...
Principal
United States Digital Space LLC
Boston, MA
2 days ago
Principal Software Engineer
$250k - $310k
...Forward Thinking. It’s how we stay driven, supportive, and always one step ahead as AI reshapes our world. Why this role? As a Principal Software Engineer at Snyk, you’ll be a key leader in our Engineering Team, shaping the future of our products and improving how...
Principal
Work at office
Work from home
Flexible hours
Dormont Manufacturing Company
Boston, MA
3 days ago
Principal Software Engineer, AI Security Platform Lead
Snyk is seeking a Principal Software Engineer to lead our Engineering Team in Boston, Massachusetts. You will influence the technical direction of our platform and play a crucial role in integrating security into software development processes. The ideal candidate has over...
Principal
Dormont Manufacturing Company
Boston, MA
3 days ago

Do you want to receive more vacancies?

Subscribe and receive similar vacancies to Principal Site Reliability Engineer. Be the first to apply!