Sign up to access all features of our service.
  • Job search
  • Favorites
  • Create a CV
    New
  • Salaries
  • Subscriptions

Staff Site Reliability Engineer

$180k - $225k

Thrive Market

ABOUT THRIVE MARKET

Thrive Market was founded in 2014 with a mission to make healthy and sustainable living easy and affordable for everyone. As an online, membership-based market, we deliver the highest quality healthy, and sustainable products at member-only prices, while matching every paid membership with a free one for someone in need. Every day, we leverage innovative technology and member-first thinking to help our over 1,700,000+ members find better products, support better brands, and build a better world in the process. We are also a Certified B Corporation, a Public Benefit Corporation, and a Climate Neutral Certified company.

Join us as we bring healthy and sustainable living to millions of Americans in the years to come.

THE ROLE

We're looking for a Staff Site Reliability Engineer to help define and build the reliability foundation for Thrive Market's platform. You'll be working with a first-class group of engineers to establish our SRE practice from the ground up; defining SLOs, SLIs and Error Budgets, building observability into everything we do, and creating the frameworks that ensure our systems scale reliably during our company's rapid growth.

This is a high-impact role at an exciting inflection point. We've recently containerized our entire platform on Kubernetes, and we're evaluating a potential platform migration to a next-generation ecommerce platform. You'll be balancing hands-on reliability work with the strategic thinking needed to build systems that self-heal and get better over time.

If you've read books like The Google SRE Handbook, The Phoenix Project, Accelerate, The DevOps Handbook, etc., this is the right place for you!
RESPONSIBILITIES
Reliability & Observability
  • Define, implement, and own Service Level Objectives (SLOs) and Service Level Indicators (SLIs) across critical platform services
  • Build and maintain comprehensive monitoring, alerting, and observability systems using tools like Datadog, Prometheus, Grafana, or similar platforms
  • Establish error budgets and use them to balance feature velocity with reliability investments
  • Lead incident response efforts, conduct blameless postmortems, and drive systemic improvements that prevent recurrence
  • Design and implement chaos engineering practices to proactively identify failure modes before they impact members
Infrastructure & Platform
  • Architect and optimize our Kubernetes-based container orchestration platform for reliability, performance, and cost efficiency
  • Support large infrastructure migrations, ensuring a smooth transition with minimal disruption to business operations
  • Contribute to the evaluation and execution of potential platform migrations, with a focus on reliability planning and risk mitigation
  • Design and implement automated deployment pipelines that enable rapid, error-free releases with feature flags and built-in rollback/roll-forward capabilities
  • Develop and own disaster recovery plans, capacity planning models, and system hardening initiatives
  • Collaborate closely with product engineering teams to help them scale their infrastructure in AWS and adopt SRE best practices
Culture & Process
  • Help establish SRE as a practice at Thrive Market, defining the team's charter, processes, and engagement model with product engineering teams
  • Champion a culture of operational excellence, continuous improvement, and data-driven reliability decisions
  • Create and maintain technical documentation covering architecture decisions, runbooks, incident response procedures, and operational playbooks
  • Participate in weekly on-call rotations and help build sustainable on-call practices that avoid burnout
  • Identify systemic problems and inefficiencies across the engineering organization and make strategic recommendations for improvement
QUALIFICATIONS
Required
  • B.S. in Computer Science or equivalent professional experience
  • 7+ years of hands-on experience in SRE, DevOps, or Infrastructure Engineering, with a proven track record of improving reliability at rapidly growing companies
  • Deep expertise in Kubernetes (K8s) - including cluster management, Helm charts, service meshes, and production-grade container orchestration
  • Strong systems engineering background with advanced proficiency in Linux administration
  • Advanced scripting and automation skills in Bash, Python, Golang, Ruby, or similar languages
  • Extensive experience with core AWS services including EC2, ECS/EKS, S3, VPC, IAM, CloudWatch, Route 53, RDS, and Lambda
  • Strong experience with Infrastructure as Code tools (Terraform, CloudFormation, Pulumi, or similar)
  • Hands-on experience defining and implementing SLOs, SLIs, and error budgets in production environments
  • Deep understanding of CI/CD pipelines and deployment strategies (blue-green, canary, rolling deployments)
  • Expertise in monitoring and observability platforms (Datadog, Prometheus, Grafana, New Relic, or similar)
  • Strong knowledge of web application infrastructure, networking, load balancing, and security best practices
  • Excellent communication skills with the ability to lead incident response and facilitate blameless postmortems
Preferred
  • Experience with e-commerce platforms (Magento, Shopify, or comparable) and the unique reliability challenges they present at scale
  • Experience with ConcourseCI, Github Actions (GHA) or similar deployment frameworks
  • Experience with chaos engineering tools and practices (Gremlin, Litmus, Chaos Monkey, or similar)
  • Familiarity with GitOps workflows (ArgoCD, Flux) and service mesh technologies (Istio, Linkerd)
  • Experience building and managing cost-optimization strategies for cloud infrastructure
  • Background in establishing SRE practices in organizations transitioning from traditional DevOps models
  • Experience with configuration management tools (Ansible, Chef, Puppet, or similar)
BELONG TO A BETTER COMPANY
  • Comprehensive health benefits (medical, dental, vision, life and disability)
  • Competitive salary (DOE) + equity
  • 401k plan
  • 9 Observed Holidays
  • Flexible Paid Time Off
  • Subsidized ClassPass Membership with access to fitness classes and wellness and beauty experiences
  • Ability to work in our beautiful office in Playa Vista
  • Free Thrive Market membership with exclusive employee discount
  • Coverage for Life Coaching & Therapy Sessions on our holistic mental health and well-being platform
We're a community of more than 1 Million + members who are united by a singular belief: It should be easy to find better products, support better brands, make better choices, and build a better world in the process.

At Thrive Market, we believe in building a diverse, inclusive, and authentic culture. If you are excited about this role along with our mission and values, we encourage you to apply.

Thrive Market is an EEO/Veterans/Disabled/LGBTQ employer

At Thrive Market, our goal is to be a diverse and inclusive workplace that is representative, at all job levels, of the members we serve and the communities we operate in. We're proud to be an inclusive company and an Equal Opportunity Employer and we prohibit discrimination and harassment of any kind. We believe that diversity and inclusion among our teammates is critical to our success as a company, and we seek to recruit, develop and retain the most talented people from a diverse candidate pool. If you're thinking about joining our team, we expect that you would agree!

Employment with Thrive Market requires that employees be based in the United States. This is a condition of employment and must be maintained throughout the duration of employment.

If you need assistance or accommodation due to a disability, please email us at View email address on click.appcast.io and we'll be happy to assist you.

Ensure your Thrive Market job offer is legitimate and don't fall victim to fraud. Thrive Market never seeks payment from job applicants. Thrive Market recruiters will only reach out to applicants from an @thrivemarket.com email address. For added security, where possible, apply through our company website at

© Thrive Market 2026 All rights reserved.

JOB INFORMATION
  • Compensation Description - The base salary range for this position is $180,000 - $225,000/Per Year.
  • Compensation may vary outside of this range depending on several factors, including a candidate's qualifications, skills, competencies and experience, and geographic location.
  • Total Compensation includes Base Salary, Stock Options, Health & Wellness Benefits, Flexible PTO, and more!
  • This position requires traveling to our HQ office in Los Angeles, California, twice a year for all-company summits; once in the summer and once in the winter.
Vacancy posted 13 hours ago
Similar jobs that could be interesting for youBased on the Staff Site Reliability Engineer in United States vacancy
  • $114k - $148k

     ...position is listed on behalf of a partner company, who manages all applications and next steps. Our partner is looking for a Site Reliability Engineer based in United States. This role focuses on building and maintaining highly available, secure, and performant cloud... 
    Suggested
    Remote job
    Full time
    Temporary work

    jobgether

    United States
    3 days ago
  • $175k - $195k

     ...is listed on behalf of a partner company, who manages all applications and next steps. Our partner is looking for a Senior Site Reliability Engineer - AWS based in the United States. This role offers an opportunity to shape the reliability, scalability, and... 
    Suggested
    Remote job
    Full time
    Temporary work

    jobgether

    United States
    3 days ago
  •  ...on behalf of a partner company, who manages all applications and next steps. Our partner is looking for a Senior Software Engineer, Site Reliability Engineering based in United States. This role sits at the core of building and maintaining highly reliable, scalable,... 
    Suggested
    Remote job
    Full time
    Flexible hours

    jobgether

    United States
    2 days ago
  •  ...About the Company Role: SRE RunOps Engineer Location: Irving, TX Onsite job About the Role Production Support...  .... Implement infrastructure best practices around reliability, scalability, and cost efficiency. Assist with deployments,... 
    Suggested
    Work experience placement

    Resolve Tech Solutions

    Irving, TX
    20 hours ago
  •  ...experts in AI with world-class process engineers who can focus on the “last mile” with customers...  ...The SRE is responsible for the reliability, performance, observability, and operational...  ...between factory networks and cloud (site-to-site VPNs, routing, DNS, TLS, private... 
    Suggested
    Night shift

    Robert Half

    Northville, MI
    3 days ago
  •  ...No H1 or C2C. Must be Permanent Resident or US Citizen Senior Site Reliability Engineer Description and Requirements About Our Team We are building Quantum , a next‑generation hybrid AI platform that spans Windows, Android, and cloud. As part of this vision... 
    Permanent employment
    Remote work

    SDI International

    Chicago, IL
    1 day ago
  • $175k - $250k

     ...Job Title: Senior Cloud Infrastructure Engineer Location: San Francisco, CA. Remote unavailable. Modality: On-Site only. Must live within commuting distance...  ...while ensuring scalability, performance, and reliability across environments. What You’ll Do Design... 
    Full time
    Remote work
    Relocation
    Relocation package

    The Recruiting Guy

    San Francisco, CA
    2 days ago
  •  ...and overall system health. Scale systems sustainably through automation, and evolve systems by pushing for changes that improve reliability, scalability, and velocity. Practice sustainable incident response and blameless postmortems. Minimum Qualifications... 

    Netpace

    Encino, CA
    3 days ago
  • $130k - $180k

     ...both work styles in a workplace that is intentional about belonging, collaboration, and accomplishment. Being a Senior Site Reliability Engineer at iManage Means... You are an engineer, a builder, and a systems thinker. You'll create middleware and platform guardrails... 
    Work at office
    Local area
    Remote work
    Worldwide
    Monday to Friday
    Flexible hours

    iManage

    Chicago, IL
    3 days ago
  •  ...Job Summary We are seeking an experienced Senior DevOps / Site Reliability Engineer (SRE) with strong application and infrastructure knowledge. The role requires hands-on expertise in AWS, Kubernetes, CI/CD, monitoring, and .NET-based applications to ensure high availability... 

    Prophecy Technologies

    Miami, FL
    12 hours ago
  •  ...About the job Senior Site Reliability Engineer At 1872 AI, we are transforming industrial manufacturing in the United States by building AI-native factories capable of lights-out production-from CAD to finished part with minimal human input. Our initial focus... 

    1872.ai

    Cincinnati, OH
    4 days ago
  •  ...As a Senior Systems Engineer on the AI Ops team, you will design and build AI-driven tooling and automation that improves system reliability, and engineering productivity across the organization. You will partner closely with infrastructure, observability, and incident... 
    Live in

    Rock Family of Companies

    Detroit, MI
    20 hours ago
  •  ...Role: Site Reliability Engineering (SRE) Location: Los Angeles, CA Remote position Fulltime position JD Site Reliability Engineer Experience in Cloud platforms (AWS, Azure, Google Cloud) and hybrid environments. Proficiency... 
    Full time
    Remote work

    SARIAN Co

    Los Angeles, CA
    4 days ago
  • $130k - $145k

     ...and your desire to team up with some of the best and brightest in technology and entertainment. The Role The Site Reliability Engineer (SRE) II is responsible for designing, implementing, and maintaining scalable and reliable systems and applications. Focus... 
    Full time
    Local area
    Worldwide
    Flexible hours

    AXS

    Los Angeles, CA
    14 hours ago
  • $150k - $175k

     ...We are seeking a Senior Site Reliability Engineer to own and evolve the infrastructure that supports our on-premise instruments, data systems, and machine learning pipelines. This role combines systems-level engineering with software craftsmanship, requiring deep understanding... 

    Mango

    Los Angeles, CA
    1 day ago
  •  ...Site Reliability Engineer Ft Lauderdale, Florida, United States Job Description Act as the primary point of contact for issue management, acknowledging and addressing emergency situations and high-severity incidents with speed and professionalism. Collaborate... 
    Local area

    5 Star Global Recruitment Partners

    Fort Lauderdale, FL
    20 hours ago
  •  ...maintenance of Quindar deployments at government sites. As the team grows, you will define best...  .... You will work closely with engineering teams (frontend, backend, and flight/...  ...professional experience as an SRE, DevOps, reliability, infrastructure, or platform engineer... 
    Permanent employment

    Quindar

    Los Angeles, CA
    20 hours ago
  • $140k - $205k

     ...Senior Technology Site Reliability Engineer Cooley is seeking a Senior Site Reliability Engineer to join the Infrastructure & Development Operationsteam. Position summary: The Senior Technology Site Reliability Engineer("SRE") is responsible for ensuring the reliability... 
    Full time
    Temporary work
    Work at office
    Flexible hours
    Weekend work

    Cooley

    Santa Monica, CA
    5 days ago
  • $96k - $163k

     ...Senior Site Reliability Engineer Mastercard powers economies and empowers people in 200+ countries and territories worldwide. Together with our customers, we're helping build a sustainable economy where everyone can prosper. We support a wide range of digital payments... 
    Full time
    Temporary work
    Part time
    Worldwide
    Flexible hours
    Shift work
    Weekend work

    Dynamic Yield

    O Fallon, MO
    20 hours ago
  • $91k - $110k

     ...collaboration, innovation, and personal growth. Be part of a team that makes a real difference. Job Description The Site Reliability Engineer (SRE) is responsible for ensuring the reliability, availability, and performance of critical technology services and... 
    Full time
    Part time
    Local area
    Remote work
    Monday to Friday
    Flexible hours
    Weekend work

    Genex Services

    Chicago, IL
    12 hours ago
  •  ...Site Reliability Engineering (SRE)/Dev Ops Primary Location: Louisville, Kentucky V-Soft Consulting is currently hiring for an Site Reliability Engineering (SRE)/Dev Ops for our premier client in Louisville, Kentucky. Knowledge, Skills and Abilities » Requires a college... 
    Currently hiring
    Local area

    V-Soft Consulting Group

    Louisville, KY
    1 day ago
  • $127k - $249k

     ...The Team Platform Engineering is the department within SRE that is responsible for a range of critical infrastructure and operational...  ...fleet, alongside the critical components that ensure cluster reliability and security (e.g., CoreDNS, cert-manager, and Gatekeeper).... 
    Work at office
    Local area
    Remote work
    Worldwide
    Flexible hours

    MongoDB

    Miami, FL
    1 day ago
  • $65 - $70 per hour

     ...Senior Site Reliability Engineer (SRE) / Cloud Engineer We are hiring a Senior Site Reliability Engineer (SRE) / Cloud Engineer to join our client's engineering and operations team on a contract basis. This is a fantastic opportunity to work on mission-critical applications... 
    Hourly pay
    Contract work
    Temporary work

    Collabera

    Chandler, AZ
    2 days ago
  •  ...Site Reliability Engineer (SRE) At Air Apps, we believe in thinking bigger—and moving faster. We're a family-founded company on a mission to create the world's first AI-powered Personal & Entrepreneurial Resource Planner (PRP), and we need your passion and ambition... 
    Remote work
    Worldwide

    Air Apps

    United States
    20 hours ago
  •  ...Senior Site Reliability Engineer (Senior SRE) Cybercrime is rising, reaching record highs in 2024. According to the FBI's IC3 report, total losses exceeded $16 billion. With investment fraud and BEC scams at the forefront, the real estate sector remains a lucrative... 
    Remote work
    Flexible hours
    Night shift

    CertifID LLC

    United States
    20 hours ago
  • $142.7k - $158.3k

     ...Basic Qualifications Bachelor's degree in Software Engineering, or related Science, Technology, Engineering or Mathematics field...  ...Responsibilities for this Position What You'll Own SLOs and reliability metrics. Define service level objectives for every AI service... 
    Remote work
    Flexible hours

    General Dynamics Mission Systems

    Scottsdale, AZ
    2 days ago
  •  ...Senior Site Reliability Engineer - Operations As a leading financial services and healthcare technology company based on revenue, SS&C is headquartered in Windsor, Connecticut, and has 27,000+ employees in 35 countries. Some 20,000 financial services and healthcare... 
    Ongoing contract
    Casual work
    Remote work
    Flexible hours

    SS&C Technologies Holdings

    Missouri
    3 days ago
  • $115.9k - $205.1k

     ...within the HPDF architecture team to make reliability, resilience, and observability first-...  ...improve facility resilience. Performance Engineering: Participate in testing and performance...  ...~ Required: 10 or more years SRE (Site Reliability Engineering), DevOps, or Systems... 
    Full time
    Part time
    Remote work
    Flexible hours

    Thomas Jefferson National Accelerator Facility

    Newport News, VA
    4 days ago
  •  ...and be part of something special, check out our current openings and apply today. Mile 2 is looking to hire for multiple Site Reliability Engineer and Senior Site Reliability Engineer positions to join our team! Mile 2 maintains a hybrid workforce. Our employees can... 
    Work at office
    Work from home

    Mile Two LLC

    Kings Mills, OH
    1 day ago
  • $145k - $175k

     ...Senior Site Reliability Engineer (Hybrid) Chicago, IL For 41 years, Rewards Network has been helping restaurants grow revenue, increase traffic, and boost customer engagement through innovative financial, marketing services, and premier dining rewards programs. By... 
    Full time
    Temporary work
    Work at office
    Local area
    Flexible hours
    3 days per week

    Rewards Network

    Chicago, IL
    2 days ago

Do you want to receive more vacancies?

Subscribe and receive similar vacancies to Staff Site Reliability Engineer. Be the first to apply!