Staff Site Reliability Engineer

$180k - $225k

ThriveMarket

ABOUT THRIVE MARKET

Thrive Market was founded in 2014 with a mission to make healthy and sustainable living easy and affordable for everyone. As an online, membership-based market, we deliver the highest quality healthy, and sustainable products at member-only prices, while matching every paid membership with a free one for someone in need. Every day, we leverage innovative technology and member-first thinking to help our over 1,700,000+ members find better products, support better brands, and build a better world in the process. We are also a Certified B Corporation, a Public Benefit Corporation, and a Climate Neutral Certified company.

Join us as we bring healthy and sustainable living to millions of Americans in the years to come.

THE ROLE

We're looking for a Staff Site Reliability Engineer to help define and build the reliability foundation for Thrive Market's platform. You'll be working with a first-class group of engineers to establish our SRE practice from the ground up; defining SLOs, SLIs and Error Budgets, building observability into everything we do, and creating the frameworks that ensure our systems scale reliably during our company's rapid growth.

This is a high-impact role at an exciting inflection point. We've recently containerized our entire platform on Kubernetes, and we're evaluating a potential platform migration to a next-generation ecommerce platform. You'll be balancing hands-on reliability work with the strategic thinking needed to build systems that self-heal and get better over time.

If you've read books like The Google SRE Handbook, The Phoenix Project, Accelerate, The DevOps Handbook, etc., this is the right place for you!
RESPONSIBILITIES
Reliability & Observability

Define, implement, and own Service Level Objectives (SLOs) and Service Level Indicators (SLIs) across critical platform services
Build and maintain comprehensive monitoring, alerting, and observability systems using tools like Datadog, Prometheus, Grafana, or similar platforms
Establish error budgets and use them to balance feature velocity with reliability investments
Lead incident response efforts, conduct blameless postmortems, and drive systemic improvements that prevent recurrence
Design and implement chaos engineering practices to proactively identify failure modes before they impact members

Infrastructure & Platform

Architect and optimize our Kubernetes-based container orchestration platform for reliability, performance, and cost efficiency
Support large infrastructure migrations, ensuring a smooth transition with minimal disruption to business operations
Contribute to the evaluation and execution of potential platform migrations, with a focus on reliability planning and risk mitigation
Design and implement automated deployment pipelines that enable rapid, error-free releases with feature flags and built-in rollback/roll-forward capabilities
Develop and own disaster recovery plans, capacity planning models, and system hardening initiatives
Collaborate closely with product engineering teams to help them scale their infrastructure in AWS and adopt SRE best practices

Culture & Process

Help establish SRE as a practice at Thrive Market, defining the team's charter, processes, and engagement model with product engineering teams
Champion a culture of operational excellence, continuous improvement, and data-driven reliability decisions
Create and maintain technical documentation covering architecture decisions, runbooks, incident response procedures, and operational playbooks
Participate in weekly on-call rotations and help build sustainable on-call practices that avoid burnout
Identify systemic problems and inefficiencies across the engineering organization and make strategic recommendations for improvement

QUALIFICATIONS
Required

B.S. in Computer Science or equivalent professional experience
7+ years of hands-on experience in SRE, DevOps, or Infrastructure Engineering, with a proven track record of improving reliability at rapidly growing companies
Deep expertise in Kubernetes (K8s) - including cluster management, Helm charts, service meshes, and production-grade container orchestration
Strong systems engineering background with advanced proficiency in Linux administration
Advanced scripting and automation skills in Bash, Python, Golang, Ruby, or similar languages
Extensive experience with core AWS services including EC2, ECS/EKS, S3, VPC, IAM, CloudWatch, Route 53, RDS, and Lambda
Strong experience with Infrastructure as Code tools (Terraform, CloudFormation, Pulumi, or similar)
Hands-on experience defining and implementing SLOs, SLIs, and error budgets in production environments
Deep understanding of CI/CD pipelines and deployment strategies (blue-green, canary, rolling deployments)
Expertise in monitoring and observability platforms (Datadog, Prometheus, Grafana, New Relic, or similar)
Strong knowledge of web application infrastructure, networking, load balancing, and security best practices
Excellent communication skills with the ability to lead incident response and facilitate blameless postmortems

Preferred

Experience with e-commerce platforms (Magento, Shopify, or comparable) and the unique reliability challenges they present at scale
Experience with ConcourseCI, Github Actions (GHA) or similar deployment frameworks
Experience with chaos engineering tools and practices (Gremlin, Litmus, Chaos Monkey, or similar)
Familiarity with GitOps workflows (ArgoCD, Flux) and service mesh technologies (Istio, Linkerd)
Experience building and managing cost-optimization strategies for cloud infrastructure
Background in establishing SRE practices in organizations transitioning from traditional DevOps models
Experience with configuration management tools (Ansible, Chef, Puppet, or similar)

BELONG TO A BETTER COMPANY

Comprehensive health benefits (medical, dental, vision, life and disability)
Competitive salary (DOE) + equity
401k plan
9 Observed Holidays
Flexible Paid Time Off
Subsidized ClassPass Membership with access to fitness classes and wellness and beauty experiences
Ability to work in our beautiful office in Playa Vista
Free Thrive Market membership with exclusive employee discount
Coverage for Life Coaching & Therapy Sessions on our holistic mental health and well-being platform

We're a community of more than 1 Million + members who are united by a singular belief: It should be easy to find better products, support better brands, make better choices, and build a better world in the process.

At Thrive Market, we believe in building a diverse, inclusive, and authentic culture. If you are excited about this role along with our mission and values, we encourage you to apply.

Thrive Market is an EEO/Veterans/Disabled/LGBTQ employer

At Thrive Market, our goal is to be a diverse and inclusive workplace that is representative, at all job levels, of the members we serve and the communities we operate in. We're proud to be an inclusive company and an Equal Opportunity Employer and we prohibit discrimination and harassment of any kind. We believe that diversity and inclusion among our teammates is critical to our success as a company, and we seek to recruit, develop and retain the most talented people from a diverse candidate pool. If you're thinking about joining our team, we expect that you would agree!

Employment with Thrive Market requires that employees be based in the United States. This is a condition of employment and must be maintained throughout the duration of employment.

If you need assistance or accommodation due to a disability, please email us at View email address on click.appcast.io and we'll be happy to assist you.

Ensure your Thrive Market job offer is legitimate and don't fall victim to fraud. Thrive Market never seeks payment from job applicants. Thrive Market recruiters will only reach out to applicants from an @thrivemarket.com email address. For added security, where possible, apply through our company website at

JOB INFORMATION

Compensation Description - The base salary range for this position is $180,000 - $225,000/Per Year.
Compensation may vary outside of this range depending on several factors, including a candidate's qualifications, skills, competencies and experience, and geographic location.
Total Compensation includes Base Salary, Stock Options, Health & Wellness Benefits, Flexible PTO, and more!
This position requires traveling to our HQ office in Los Angeles, California, twice a year for all-company summits; once in the summer and once in the winter.

Apply

Vacancy posted 3 days ago

Similar jobs that could be interesting for youBased on the Staff Site Reliability Engineer in United States vacancy

Site Reliability Engineer
...shape the future of healthcare, we’d love to meet you. About the role We’re hiring an SRE to join our engineering team at Plenful and take ownership of the reliability and performance of the systems that power our product. You’ll work across our distributed workflow...
Suggested
Work at office
Remote work
Flexible hours
2 days per week
Plenful
San Francisco, CA
2 days ago
Senior Site Reliability Engineer
...Site Reliability Engineers are responsible for ensuring the availability, reliability, scalability, and performance of the firm’s most critical customer-facing microservices that power all eCommerce channels. This role applies Google-inspired SRE principles to balance...
Suggested
Local area
Remote work
Flexible hours
Shift work
O'Reilly Technology Services, Inc.
Pierce, ID
1 day ago
Site Reliability Engineer
...Seeking a Site Reliability Engineer Specialist to work remotely in a full-time capacity, responsible for leading observability and incident response efforts, defining instrumentation standards, and mentoring engineers across teams. Key responsibilities Own the technical...
Suggested
Full time
Remote work
Virtual Vocations Inc
United States
3 days ago
Site Reliability Engineer
...across global OTC markets.We are seeking a Market Data Support Engineer to join our Manila-based team within Parameta Solutions. This... ...production issues under pressureStrong focus on data integrity, system reliability, and service uptimeAbility to work independently with minimal...
Suggested
TP ICAP
Manila, UT
12 hours ago
Senior Site Reliability Engineer
...Joining a high-performing team remotely, the full-time Senior Site Reliability Engineer will own the reliability and automation of critical AI infrastructure, ensuring systems are resilient and secure while building automation tools to streamline operational workflows....
Suggested
Full time
Remote work
Virtual Vocations Inc
United States
3 days ago
Senior Site Reliability Engineer
...role, we encourage you to apply. The Role As a Senior Platform Engineer, you are a champion for DevOps and SRE culture and industry... ...company goals are met. What You Will Be Doing Improving production reliability and system resilience within an SRE scoped team Championing...
Flexible hours
Megaport
Cambridge, ID
1 day ago
Associate Site Reliability Engineer
...Working remotely or in a hybrid capacity, the full-time Associate Site Reliability Engineer will monitor and support the live production environment, manage incident responses, and assist with release management while developing essential skills in a collaborative global...
Full time
Internship
Remote work
Virtual Vocations Inc
United States
3 days ago
Principal Site Reliability Engineer
...Seeking a Principal Site Reliability Engineer for a hybrid role based in San Jose, CA, or a remote position, who will provide technical vision and hands-on execution to enhance the reliability of a global platform, focusing on automation and observability across multi-...
Remote work
Virtual Vocations Inc
United States
3 days ago
Staff Site Reliability Engineer
...As a full-time Staff Site Reliability Engineer working remotely, the successful candidate will embed with product and platform teams to shape designs for reliability, define production-readiness standards, and drive initiatives that enhance operational excellence across...
Full time
Remote work
Virtual Vocations Inc
United States
3 days ago
Lead Site Reliability Engineer
...To enhance system performance and reliability, the full-time Lead Site Reliability Engineer will develop and implement software solutions, ensure compliance with security standards, and provide post-deployment support while working remotely from anywhere in the U.S. Key...
Full time
Remote work
Virtual Vocations Inc
United States
3 days ago
Principal Site Reliability Engineer
...Principal Site Reliability Engineer Req ID: 10147292 At Disney Experiences Technology, our team creates world‑class immersive digital experiences for the Company’s premier vacation brands including Disney’s Parks & Resorts worldwide, Disney Cruise Line, Aulani, a Disney...
Work experience placement
Worldwide
Dormont Manufacturing Company
Silver Bay, MN
1 day ago
Senior Site Reliability Engineer
$104.9k - $174.7k
...scale, 24x7, distributed and fault-tolerant systems within agreed reliability objectives, whilst enabling the fast flow of feature and... ...strong automation skills. About team; This diverse team of Engineers in assisting multiple product teams as we continue to innovate...
Local area
Immediate start
Worldwide
RELX
Dover, DE
2 days ago
Senior Site Reliability Engineer
$81.1k - $187k
...Job Description We are looking for a Site Reliability Engineer 3 to support mission-critical cloud services and production operations. The role focuses on improving service reliability, reducing operational risk, automating repetitive tasks, and driving faster detection...
Temporary work
Immediate start
Flexible hours
Shift work
Oracle
Providence, RI
4 days ago
Senior Site Reliability Engineer
...Senior Site Reliability Engineer Austin, Texas, United States Who We Are At 2K, we create some of the most iconic and culture-shaping video games in entertainment, including NBA® 2K, one of the top-selling franchises in the world, and legendary titles like BioShock...
2K
Austin, TX
5 days ago
Site Reliability Engineer, Senior
$86.9k - $198k
...Job Number: R0232211 Site Reliability Engineer, Senior The Opportunity: Engineering to make a system more resilient and efficient frees up time and money to build more capabilities. Whether you come from a background in network engineering, systems administration...
Full time
Contract work
Part time
Work at office
Local area
Remote work
Booz Allen Hamilton
Aurora, CO
2 days ago
Site Reliability Engineer
...Site Reliability Engineer Duration: Long Term Client: UPS This is a Hybrid Work Model (3x a week Onsite) and Location is Parsippany, NJ. Job Description: We are looking for a talented Site Reliability Engineer (SRE) with a strong background in Google Cloud Platform...
Sparktek
Parsippany, NJ
3 days ago
Senior Site Reliability Engineer
$121.4k - $218.6k
...will be responsible for ensuring best-in-class uptime and reliability of our AI hardware infrastructure offerings. **Partner with... ...and defend them when they are breached. As a Senior Site Reliability Engineer, you will be responsible for: + Developing and scaling robust...
Work experience placement
Work at office
Akamai
Des Moines, IA
2 days ago
Sr. Site Reliability Engineer
Sr. SRE (Engineering & Administration Background) St Louis, MO (Hybrid, 3 days onsite/Week) Long Term Contract Preferably looking for 7+ years of experience candidate Card Payment knowledge We are looking for a site reliability engineer (SRE).
Long term contract
3 days per week
Futran Tech Solutions Pvt. Ltd.
Passaic, NJ
3 days ago
Sr. Site Reliability Engineer
...Title- Sr. Site Reliability Engineer Location- Dallas, TX | New York, NY | Salt Lake City, UT (5 DAYS ONSITE ) Duration-6-12+ Months Our client, a top tier IT Consulting firm if looking for several qualified Site Reliability Engineers to join a Top-...
RIT Solutions, Inc.
Tampa, FL
12 hours ago
Site Reliability Engineer
$60 - $65 per hour
...retail industries. Rate Range: $60-$65/Hr Job Description: The Client Document Generation team is seeking a Senior Software Engineer ( IT Onshore Band 4) to participate in the full system development lifecycle (SDLC) of enterprise applications that support high-...
Immediate start
ApTask
Matthews, NC
12 hours ago
Site Reliability Engineer II
$75k - $120k
...headquarters in Denver, Colorado, and offices across the U.S., Canada, and India. Role Summary We are seeking a Site Reliability Engineer II to support the reliability, scalability, and performance of critical production services. This role contributes to the...
Contract work
Temporary work
Work at office
Work from home
Flexible hours
Vertafore
Denver, CO
12 hours ago
Site Reliability Engineer
...Job Description: As part of the Site Reliability Engineering team within the Reference Data Engineering group, you'll help build a meaningful engineering discipline, combining software and systems to develop creative engineering solutions to runtime problems...
Seven Seven Software
Wilmington, DE
2 days ago
Senior Site Reliability Engineer
$160k - $180k
...that advisors use in helping clients achieve wealth, independence, and purpose. The Opportunity We are seeking a Site Reliability Engineer (SRE) to join our Charlotte-based engineering team. This role sits at the center of platform resilience - ensuring high availability...
Work at office
Flexible hours
AssetMark
Charlotte, NC
3 days ago
Site Reliability Engineer
...Job Title: Site Reliability Engineer Location: Seattle, WA FTE Only Job Description Must Have Technical/Functional Skills • 15+ years of IT experience with at least 5+ years in API management and integration architecture. •...
AceStack LLC
Seattle, WA
3 days ago
Site Reliability Engineer
...Site Reliability Engineer We are looking for an outgoing and dynamic Site Reliability Engineer to manage the successful operation and support of Frontline application environments. This position is responsible for overseeing Frontline application policies and procedures...
Temporary work
ORS Partners
Malvern, PA
3 days ago
Senior Site Reliability Engineer
$111.6k - $186k
...Company Cox Automotive - USA Job Family Group Engineering / Product Development Job Profile Sr Software Engineer... ...an incentive program. Job Description Senior Site Reliability Engineer Department: Engineering / Platform...
Remote work
Relocation
Flexible hours
Shift work
Cox Communications
Austin, TX
1 day ago
Senior Site Reliability Engineer
...Business Technology Platform (BTP) Senior SRE/DevOps Operations Engineer to assist in ensuring the highest level of uptime and Quality... ...with product development team to design and enhance service reliability - Exposure in developing and implementing testing strategies...
Flexible hours
Insight Global
Newtown Square, PA
1 day ago
Site Reliability Engineer
...DevOps Engineer Responsible for reliability and support of container platform on-prem and external clouds (Azure /AWS /Google) Monitor and troubleshoot container platform environment performance issues, connectivity issues, security issues, etc. Perform deep dives...
Samprasoft
New York, NY
4 days ago
Sr Site Reliability Engineer
$109.5k - $150.55k
...strive for the best, own our actions, and grow and evolve. Job Description Renaissance is looking for an experienced Sr Site Reliability Engineer to be part of the Engineering Enablement group's Site Reliability Team with a focus on Application and Infrastructure...
For contractors
Local area
Remote work
Worldwide
Work visa
Flexible hours
Weekend work
Renaissance Services
Atlanta, GA
5 days ago
Site Reliability Engineer (SRE)
...self-healing, deployment/rollback automation). Establish reliability standards: SLOs/SLIs, error budgets, production readiness reviews... ..., and release risk controls. Performance and reliability engineering: capacity planning, load/performance analysis, resilience...
Bahwan CyberTek
New York, NY
1 day ago

Do you want to receive more vacancies?

Subscribe and receive similar vacancies to Staff Site Reliability Engineer. Be the first to apply!