Sign up to access all features of our service.
  • Job search
  • Favorites
  • Create a CV
    New
  • Salaries
  • Subscriptions

Site Reliability Engineer

HostPapa

Position Summary With team members and customers in 39 countries around the globe, HostPapa is currently one of the fastest-growing web hosting companies with a wide range of products available. At its core, we provide individuals and small and medium-sized businesses with access to valuable tools and services critical to their online success, including a Website Builder service for making website creation an ultra-easy task for anyone. Tailored to meet every user's unique needs, our award-winning customer support, email, and cloud-based solutions keep HostPapa at the cutting edge of the web hosting industry and innovation by putting our customers first. This role focuses on CloudBlue, a HostPapa business that powers cloud commerce for many of the world’s largest service providers, including major Telcos, distributors, and MSPs. CloudBlue enables partners to monetize and manage cloud services and subscriptions at scale, combining the agility of a high-growth business with the backing of a global organization. As the Site Reliability Engineer, you will help ensure the reliability, scalability, and observability of CloudBlue’s multi-tenant SaaS platforms used by service providers worldwide. You will focus on improving system stability and performance through monitoring, high availability, and incident response, while working closely with DevOps, Platform, and Engineering teams to build and operate resilient production systems. What you’ll do Define and implement SLIs, SLOs, and error budgets for critical CloudBlue services to ensure reliability and performance Influence system architecture with a strong focus on reliability, scalability, and operability, designing systems for fault tolerance, graceful degradation, and self-healing Reduce operational toil by identifying opportunities for automation and process improvement Design and operate CloudBlue’s observability stack across metrics, logs, and traces using tools such as Datadog, Grafana, and Elastic Stack Develop actionable alerting strategies and dashboards that provide clear insight into platform and business health Design and maintain high-availability architectures, implementing redundancy, failover, and disaster recovery strategies across regions and availability zones Conduct capacity planning, load testing, and performance optimization to ensure platform stability and scalability Act as a senior responder during production incidents, leading incident coordination, communication, and service restoration Own blameless postmortems and drive improvements that reduce incident frequency, MTTR, and customer impact Improve reliability of Kubernetes-based platforms through health checks, autoscaling strategies, rollout safety, and resilience testing Partner with engineering and DevOps teams to improve deployment safety, rollback strategies, and platform reliability Maintain runbooks and operational documentation, and promote SRE best practices across engineering teams Support other tasks or projects as assigned to meet team and business needs About you 3+ years of experience as an SRE, DevOps Engineer, or Production Engineer, with strong ownership of production systems Proven experience operating highly available, enterprise-grade, multi-tenant SaaS platforms Hands-on experience with observability and monitoring tools such as Datadog, Grafana, and Elasticsearch/Kibana Solid understanding of Linux, networking, and distributed systems fundamentals Experience working with containerized environments such as Docker and Kubernetes Strong scripting and automation skills using Python and/or Bash Experience participating in on-call rotations and incident response in production environments Strong written and spoken English Experience defining SLIs/SLOs and managing error budgets at scale will be considered a plus Exposure to hyperscale or service-provider-grade platforms is an advantage Cloud experience, preferably with Azure; experience with AWS and/or GCP will also be valued Experience working with hybrid or on-premises integrations is beneficial Familiarity with chaos engineering and resilience testing will be considered an asset What We Offer Work from anywhere - this is a remote opportunity A competitive salary that values you and your unique skill sets Career advancement & professional development opportunities to help you reach your full potential Flexible work arrangements to support work/life balance About Us At HostPapa, we’ve been committed to providing a complete array of enterprise-grade cloud services solutions to every business owner since 2006. These services, traditionally out of reach to smaller businesses, are offered in a one-stop shop, making it quick and easy for customers to select the services they need to grow. We back these offerings with 24/7 award‑winning customer support in four languages. Our HostPapa team values diversity and inclusion. We have a friendly company culture built on trust and respect. With the acquisition of several companies into our product portfolio, we’re growing at an incredible rate and have ample opportunities for career growth. Come join our talented team of enthusiastic, hard-working, passionate, driven people engaged in meaningful, innovative work. We can’t wait to meet you! HostPapa is an equal-opportunity employer committed to diversity and inclusion. As a multicultural organization, we encourage individual achievement and recognize the strength of our diverse team. HostPapa is committed to providing accommodations for people with disabilities. If you require accommodation, please let us know, and we will work with you to meet your needs. Accommodation may be provided in all parts of the hiring process. It is anticipated that this position will be performed outside of Ontario. #J-18808-Ljbffr

Vacancy posted 11 hours ago
Similar jobs that could be interesting for youBased on the Site Reliability Engineer in New York, NY vacancy
  •  ...they are shifting towards Linux – (70% Windows, 30% Linux) Remote access technology protocols are a plus Job Description: Site Reliability Engineer Periodic updates and maintenance of Windows-based golden image for ESX & AWS. Patching of software, systems, appliances etc... 
    Suggested
    Remote work
    Shift work

    TechDigital Group

    New York, NY
    1 day ago
  •  ...governance and reduce risk across all environments. Drive system reliability and cost efficiency through autoscaling strategies, right‑...  ...These benefits include comprehensive health care coverage, on‑site health and wellness centers, a retirement savings plan, backup... 
    Suggested

    Koitecc Solutions

    Jersey City, NJ
    2 days ago
  • $160k - $230k

     ...We are currently looking to add Platform Engineers to our team, with at least 5 years of experience...  .... You’ll ensure our platform is reliable, secure, and performant from day one. Responsibilities...  ...collaborative setting. Our team works on-site five days a week, growing and building... 
    Suggested
    Work at office
    Local area

    Standard Template Labs

    New York, NY
    1 day ago
  • $123k - $165k

     ...Site Reliability Engineer II Job Posting ID: 10143234 Department: Engineering Fleet – Reliability Engineering & Operational Support to backend service development teams. We build world‑class products that enable Disney, ESPN, Hulu, and other media brands to reach millions... 
    Suggested
    Full time
    Worldwide

    5014 Disney Entertainment & Sports LLC

    New York, NY
    1 day ago
  • $157.5k - $254.35k

     ...signature and contract lifecycle management (CLM). What you’ll do We are looking for a self‑motivated, driven and creative Senior Site Reliability Engineer to join the Site Reliability team. Metrics and analytics drive engineering at DocuSign and ensure that we are dedicating... 
    Suggested
    Contract work
    Work at office
    Local area
    Remote work

    DocuSign

    New York, NY
    1 day ago
  • $7.5k

     ...and benefits packages, technology talks by our experts, a beautiful modern office, daily catered lunches, and more. As a Site Reliability Engineer (SRE), you will work at the intersection of production operations and software development as you improve, manage, and monitor... 
    Work at office
    Local area

    The Voleon Group

    New York, NY
    11 hours ago
  • $150k - $200k

     ...Join to apply for the Senior Site Reliability Engineer role at Gradle Inc. Develocity is a first‑of‑its‑kind toolchain observability and acceleration platform that helps software teams adopt and improve DORA capabilities (including continuous delivery) in order to achieve... 
    Full time
    Local area
    Remote work
    Work from home

    Gradle Inc.

    New York, NY
    11 hours ago
  • $150k - $170k

     ...Senior Site Reliability Engineer – Zip Co Join to apply for the Senior Site Reliability Engineer role at Zip Co At Zip, we build cloud‑native software applications that serve millions of customers and process billions of dollars in payments. We’re looking for a seasoned... 
    Casual work
    Work at office
    Remote work
    Flexible hours

    ZIP

    New York, NY
    2 days ago
  •  ...for its strong employee culture and outstanding business performance. To learn more, visit Role Summary As an Intermediate Site Reliability Engineer, you will support the reliability, performance, and scalability of cloud‑hosted services and database platforms. You will... 
    Remote work
    Worldwide
    Home office

    Cority Inc

    New York, NY
    11 hours ago
  •  ...Site Reliability Engineer OXIO is the first NeoTelco. We are building the world’s largest, most accessible, and insightful Telecom network. Our platform empowers anyone to spin up their own carrier from a browser, scaling and supporting you as you scale your network to... 

    MoneyLion

    New York, NY
    6 days ago
  • $111k - $160k

     ...Join Mizuho as a Site Reliability Engineer! In this role you will play a crucial role in maintaining the reliability, scalability, and overall performance of our production systems. This position collaborates closely with development, operations, and product teams to automate... 
    Work at office
    Local area
    Remote work

    Mizuho Financial Group Inc

    New York, NY
    1 day ago
  •  ...Curated careers, resources, tips and trends from the DevOps World. The Site Reliability Engineer position at Remotive revolves around ensuring the reliability, availability, and performance of services. This role requires a combination of software engineering and system... 
    Remote work

    DevOpsChat

    New York, NY
    11 hours ago
  • $65 - $75 per hour

     ...virtualization technologies. Knowledge of ITIL frameworks, Jira, Confluence, and IT Service Management tools. Description: As an Engineer 2, you will collaborate with management, departments, and customers to identify end-user requirements for infrastructure monitoring... 
    Contract work
    Remote work

    SBS Creatix

    New York, NY
    11 hours ago
  •  ...collaborative role in which you will work closely with our Software Engineers to deploy and operate our solutions; automate and streamline...  ...& systems that provide high levels of scalability, reliability, and performance for client applications, while balancing security... 
    Permanent employment
    Work at office

    Star Seven Six, Ltd

    Brooklyn, NY
    2 days ago
  •  ...public cloud platform from scratch? Would you like to own critical services in a new public cloud platform? Join our IaaS Site Reliability Engineering (SRE) team. We design, develop, and operate infrastructure and services that power the backbone of our cloud platform.... 
    Work at office
    Remote work

    Akamai

    New York, NY
    11 hours ago
  • $150k - $200k

     ...parts of eye care and continue shaping the future of practice management. About the Role We are looking for a seasoned Senior Site Reliability Engineer to join our dynamic team in a foundational role, owning reliability and infrastructure as our first SRE. This role will... 
    Work experience placement
    Remote work

    Barti

    New York, NY
    11 hours ago
  • $125k - $165k

     ...capacity for consumer ease. For more information, visit or follow us on LinkedIn. About the Role We're looking for a Senior Site Reliability Engineer who genuinely enjoys the craft. Someone who takes pride in a clean Terraform module, cares about observability because... 
    Temporary work
    Remote work

    DexCare

    New York, NY
    2 days ago
  • $185k - $227k

     ...united by this common purpose and we are hiring the world’s best engineers, scientists, designers, product managers, operations experts...  ...on for more details. ROLE AND RESPONSIBILITIES A Senior Site Reliability Engineer (SRE) is expected to own the operational stability... 
    Remote work

    JUUL Labs

    New York, NY
    11 hours ago
  • $160k - $195k

     ...federal, state and local agencies fuels the RapidSOS HARMONY AI engine that delivers this intelligence to those who need it most....  ...What this role is about Are you excited to work on systems where reliability directly impacts real‑world outcomes? At RapidSOS, we build... 
    Local area
    Flexible hours

    RapidSOS

    New York, NY
    2 days ago
  • $175k - $225k

     ...from enterprises across different industries. We’re fully in‑person at our NYC HQ near Union Square and are looking for exceptional engineers who are passionate about creating great products. The Role You’ll play a key role in designing and developing the core systems... 

    I did my part and supported the Regular Toilet

    New York, NY
    2 days ago
  • $200k - $240k

     ...expertise across machine learning, UI/UX, large language models, and medicine. Job Description We’re hiring an experienced Site Reliability Engineer for our Boston or NYC office! You can expect to: Design, build, and maintain resilient, scalable, and secure... 
    Work at office

    Verana Health

    New York, NY
    2 days ago
  •  ...Senior Site Reliability Engineer – Azure Cloud Join to apply for the Senior Site Reliability Engineer role at Concord Technologies Concord Technologies is growing! Currently seeking a full‑time Senior Site Reliability Engineer (Sr. SRE) , with experience engineering solutions... 
    Full time
    Local area
    Immediate start
    Remote work
    Flexible hours

    Concord Technologies

    New York, NY
    11 hours ago
  • $130k - $165k

     ...Job Title: Senior Software Engineer Company: Snapsheet Job Location: USA, Remote Job Type: Full-time, direct hire Job Department: Technology Team: Site Reliability Engineering About Snapsheet Snapsheet exists to simplify claims. We leverage... 
    Full time
    Temporary work
    Local area
    Remote work
    Visa sponsorship
    Work visa
    Flexible hours

    Snapsheet

    New York, NY
    4 days ago
  • $175k - $190k

     ...This position is posted by Jobgether on behalf of a partner company. We are currently looking for a Senior Site Reliability Engineer - AWS in United States. This role sits at the core of a fast-growing, AI-driven engineering environment focused on building highly reliable... 
    Full time
    Temporary work

    Jobgether

    New York, NY
    11 hours ago
  • $127k - $249k

     ...The Team Platform Engineering is the department within SRE that is responsible for a range of critical infrastructure and operational...  ...fleet, alongside the critical components that ensure cluster reliability and security (e.g., CoreDNS, cert-manager, and Gatekeeper). As... 
    Work at office
    Local area
    Remote work
    Worldwide
    Flexible hours

    MongoDB

    New York, NY
    11 hours ago
  • Curated careers, resources, tips and trends from the DevOps World. As a Senior Site Reliability Engineer, you will play a pivotal role in ensuring the reliability and performance of our cloud-based infrastructure. Your primary responsibilities will include monitoring system... 
    Remote work
    Flexible hours

    DevOpsChat

    New York, NY
    11 hours ago
  •  ...the future of the Internet. Summary At Latitude.sh, the Reliability team is responsible for the health and resilience of the infrastructure...  ...that powers our global bare metal cloud. As a Senior Site Reliability Engineer (SRE), you’ll focus on building reliable, observable, and... 
    For contractors

    Latitude.sh

    New York, NY
    11 hours ago
  • Legora-Ab is seeking a Senior Site Reliability Engineer to join our NYC engineering hub. You will own critical services, enhancing reliability across our platform and collaborating closely with engineering teams in Stockholm. This is a full-time, in-office position focused... 
    Full time
    Work at office

    Legora-Ab

    New York, NY
    11 hours ago
  • $500 per month

     ...accounts. Our global team is a diverse group of experienced engineers, traders, and brokerage professionals who are working to achieve...  ...impact, we encourage you to apply. Your Role: As a Site Reliability Engineer at Alpaca, you'll help keep our brokerage platform... 
    Home office

    Portage Ventures GP Inc.

    New York, NY
    3 days ago
  • We are seeking a Site Reliability Engineer (SRE) with strong expertise in Google Cloud Platform (GCP) and Kubernetes to ensure the reliability, performance, and scalability of cloud and on-premise systems. This role focuses on building resilient infrastructure, automating... 

    Compunnel, Inc.

    New York, NY
    4 days ago

Do you want to receive more vacancies?

Subscribe and receive similar vacancies to Site Reliability Engineer. Be the first to apply!