Sign up to access all features of our service.
  • Job search
  • Favorites
  • Create a CV
    New
  • Salaries
  • Subscriptions

Customer Reliability Engineer - Infrastructure

$125k - $130k
Full-time

Astronomer

Astronomer empowers data teams to bring mission-critical software, analytics, and AI to life and is the company behind Astro, the industry-leading unified DataOps platform powered by Apache Airflow®. Astro accelerates building reliable data products that unlock insights, unleash AI value, and powers data-driven applications. Trusted by more than 800 of the world's leading enterprises, Astronomer lets businesses do more with their data. To learn more, visit About this role The Astronomer Customer Reliability Engineering (CRE) team is responsible for the success of our customers' usage of our managed Airflow service. The CREs are responsible for operating, monitoring, and maintaining the platform to ensure availability, predictability, and reliable operations. As an infrastructure specialist within the team, you will focus on the reliability of the underlying cloud infrastructure and Kubernetes clusters. This entails responding to incidents either raised by a customer, or from our monitoring system and then taking further steps to ensure problems are permanently resolved or monitored. As owners of the observability platform, CRE has unlimited potential to improve the reliability of the product and deliver the best possible outcome for our customers. This role is directly customer-facing and gives exposure to very diverse problems and requirements. CRE get the opportunity to interface with customers from a variety of industries across different cloud providers, and all with different expectations. Your contributions will directly impact customers' success with using the Astronomer products, and you will be able to help make meaningful improvements to the customer experience. What you get to do: Provide solutions to customers to make them successful using our products. Troubleshoot customer environments and engage in active triaging with customers Participate in on-call rotation for weekend coverage Provide feedback to the product development teams on customer needs and pain points. Build out our monitoring and alerting systems. Build and maintain automation to ensure daily operational tasks are handled as efficiently as possible. Help direct the architecture of the products and contribute where possible. Own the customer experience, working directly with customers to prioritize and solve issues, meet SLAs, and provide “white glove” guidance on the path to production. Participate remotely within a fully distributed team. Enhance and enrich customer documentation Work with the latest technology and multi-cloud implementations What you bring to the role: 5 years of experience, preferably with large, complex cloud infrastructures operating at scale 3 years of experience with Kubernetes Experience managing a Production distributed system with at least one major cloud provider (one or all: AWS, GCP, Azure) Strong Linux experience Knowledge of how to operate and monitor issues for distributed systems Previous experience in handling customers issues (internal or external) Strong communication skills DevOps or CI/CD experience Python scripting Good troubleshooting Skills Bonus points if you have: Experience as a Site Reliability Engineer Worked with Kubernetes Custom Resources Depth of knowledge with Azure Airflow/Big Data Orchestration experience IaC experience The estimated total compensation for this role ranges from $125,000 - $130,000 based on leveling and geography, along with an equity component and a comprehensive benefits package. This range is merely an estimate; actual compensation may deviate from this range based on skills, experience, and qualifications.

  • LI-Fulltime
  • LI-Remote
At Astronomer, we value diversity. We are an equal opportunity employer: we do not discriminate on the basis of race, religion, color, national origin, gender, sexual orientation, age, marital status, veteran status, or disability status.

Vacancy posted 2 days ago
Similar jobs that could be interesting for youBased on the Customer Reliability Engineer - Infrastructure in United States vacancy
  •  ...Site Reliability Engineer (Edge Services), Infrastructure Services Austin, Texas, United States Software and Services We are seeking a proactive Site Reliability...  ...are identified and mitigated before they impact the customer. Minimum Qualifications Understanding of Linux... 
    Customer
    Shift work

    Apple

    Austin, TX
    3 days ago
  • Site Reliability Engineer - AI Infrastructure Location: Global Remote / San Francisco · Full-Time About Andromeda Andromeda Cluster was founded by Nat...  ...configure, and operate Kubernetes-based clusters for customers across multiple providers. Build automation and... 
    Customer
    Full time
    Remote work

    Andromeda Cluster

    San Francisco, CA
    4 days ago
  • At NVIDIA, Site Reliability Engineering provides a rare chance to define, develop, and support large...  ...drive the adoption of actionable, customer‑centric monitoring and alerting. Apply...  ...performant, and supportable. Background with infrastructure automation. Experience running... 
    Customer

    NVIDIA Corporation

    Santa Clara, CA
    3 days ago
  • Senior Site Reliability Engineer - AI Infrastructure Location: Global Remote / San Francisco • Full-Time About Andromeda Andromeda Cluster was founded...  ...training and inference, working directly with customers pushing the limits of modern AI systems. We’re looking... 
    Customer
    Full time
    Remote work

    Cortes 23

    San Francisco, CA
    2 days ago
  • Sr Reliability / Quality Engineer, Data Center Infrastructure Products Company Overview: Fleet is a team of digital infrastructure experts dedicated to accelerating...  ...constructs data centers at these parks, providing customers with the one-stop shop for timely top-class data... 
    Customer
    Contract work
    Work at office

    Tractcapital

    Denver, CO
    4 days ago
  •  ...About the role Anyscale is looking for a Senior Site Reliability Engineer to join the Infrastructure team. Anyscale aims to provide the next generation of...  ..., while also delivering high-impact features for our customers. Snapshot of projects you may work on Design, build... 
    Customer

    Anyscale

    San Francisco, CA
    1 day ago
  • $81.2k - $134.3k

     ...technical teams, and fulfilling user requests. The ideal candidate should have Linux/Unix systems administration experience, excellent customer service skills, and the ability to handle complex environments efficiently. This position offers a competitive salary ranging from... 
    Customer

    Role, Inc.

    Plano, TX
    1 day ago
  •  ...websites and other Internet properties for customers ranging from individual bloggers to...  ...You will join a team of talented network engineers who are building software solutions to...  ...3 years of relevant Network/Site Reliability Engineering experience BA/BS in Computer... 
    Customer
    Local area

    Cloudflare Inc

    Seattle, WA
    18 hours ago
  •  ...steps. Our partner is looking for a Site Reliability Engineer (Google Cloud Platform) based in the...  ...reliable, scalable, and compliant infrastructure using automation-first practices and...  ...ready documentation. Collaborate with customers, engineering teams, and stakeholders... 
    Customer
    Remote job
    Full time

    jobgether

    United States
    5 days ago
  • $95k - $130k

     ...Epsilyte is seeking a results-driven Reliability Engineer to lead and continuously improve the...  ...company of scale focused on solving customer needs for efficient, high R-value EPS...  ...packaging technology, and participating in infrastructure investment both in the United States... 
    Customer
    Temporary work
    For contractors
    Work at office
    Local area
    Immediate start

    Epsilyte LLC

    Peru, IL
    24 days ago
  •  ...responsible for monitoring and maintaining critical network infrastructure. This role ensures high availability of network systems while...  ...troubleshooting network issues, coordinating with service providers, and providing technical support to customers. #J-18808-Ljbffr MSRcosmos LLC
    Customer
    Night shift

    MSRcosmos LLC

    Annapolis, MD
    3 days ago
  • Balls Food Stores is looking for a highly motivated Infrastructure and Database Administrator to improve processes and support a broad technology...  ..., networking, and automation with a focus on improving systems for both teammates and customers. #J-18808-Ljbffr Balls Food Stores
    Customer

    Balls Food Stores

    Kansas City, KS
    4 days ago
  • $133.1k - $306.4k

    Senior Manager, Network Reliability Engineering Job Identification 336557 Job Category Product...  ...across distributed systems, network infrastructure, and highly available services. Improve...  ...and resilient services for our customers. Participate in the manager on‑call... 
    Customer
    Temporary work
    Flexible hours

    Ll Oefentherapie

    Santa Clara, CA
    2 days ago
  •  ...seeking a highly experienced Principal Engineer to lead strategy, planning, build...  ...external vendors to resolve complex reliability and infrastructure issues impacting product quality and...  ...way to embrace the diversity of our customers and communities is to mirror it from... 
    Customer
    Temporary work
    Remote work
    Flexible hours
    Shift work

    Sandisk

    Milpitas, CA
    13 days ago
  •  ...strong investor support and early customer traction, our team is composed of...  ...this Role We are seeking talented engineers intent on changing the security industry...  ...on fast‑moving teams, scaling infrastructure with an eye towards reliability, and relentlessly optimizing: we... 
    Customer

    RunSybil

    New York, NY
    1 day ago
  • $106.13k - $119.72k

     ...maintenance history to develop and deploy engineering solutions, improved maintenance...  ...preventative maintenance optimization, and other reliability techniques. · Provides technical...  ...· Task Management · Strategic Skills · Customer Focus · Self-awareness · Management & Leadership... 
    Customer
    Full time
    Work at office

    Advanced Technology Services

    Lafayette, IN
    5 days ago
  • $93.61k - $119.72k

     ...compliance with regulatory requirements and ATS policies and procedures. Partners with internal/external customer for engineered solutions to improve reliability and throughput. Identifies opportunities for Capital Expenditures for equipment replacement with supervision... 
    Customer
    Full time
    Work at office

    Advanced Technology Services

    Peoria, IL
    5 days ago
  •  ...over $37M since our founding in 2023. Our customers include companies like Microsoft,...  ...run millions of sandboxes. Today our infrastructure runs on Nomad and Terraform across Google...  .... We're looking for an infrastructure engineer who actually wants to live in Terraform... 
    Customer
    Live in
    Work from home

    E2B

    San Francisco, CA
    3 days ago
  • $158.2k - $200.7k

    Cisco Systems, Inc. is looking for a Customer Reliability Engineer (CRE) who will be pivotal in interacting with customers. This role involves utilizing Site Reliability Engineering best practices to support the Isovalent software suite across various cloud providers. The... 
    Customer
    Remote job

    Cisco Systems, Inc.

    New York, NY
    2 days ago
  • $160k - $215k

     ...deploy our sensors into our customers' field operations,...  ...employees (including software engineers) to visit customer sites —...  ...build the observability and reliability foundations that let us run...  ...be our first full‑time SRE/infrastructure engineer , so we’ll look to... 
    Customer
    Full time
    Work experience placement
    Work at office
    2 days per week

    Treeswift Inc

    New York, NY
    18 hours ago
  • Job Category : IT Infrastructure Engineering Note: This is a hybrid position located in Brooklyn,...  ...inefficiencies and eliminate toil to improve the reliability of Medical Mutual’s systems....  ...cash rewards for shopping with our customers. Excellent Benefits and... 
    Customer
    Work at office
    Remote work
    3 days per week

    Superior Dental Care

    Cleveland, OH
    2 days ago
  •  ...SuperPOD built on Grace Blackwell infrastructure — one of the fastest private...  ...under real-world scale, reliability, and security demands — and we're looking for an engineer who wants to own the foundation...  ...we grow. We have real paying customers and a playbook, and we still... 
    Customer

    Alembic, Inc.

    San Francisco, CA
    2 days ago
  • Hydra Host, Inc. is seeking a Site Reliability Engineer to enhance QA systems and service delivery. You will collaborate with various teams to...  ...DevOps workflows. Join a remote team dedicated to operational excellence and customer success. #J-18808-Ljbffr Hydra Host, Inc.
    Customer
    Remote job

    Hydra Host, Inc.

    Miami, FL
    3 days ago
  • A leading tech recruiting firm is seeking a Site Reliability Engineer to manage and optimize cloud infrastructure primarily using GCP or AWS. The role involves maintaining...  ...practices. This internal-facing role requires no customer interaction and focuses on improving platform... 
    Customer

    Amiri Recruiting

    Mountain View, CA
    4 days ago
  • $125.04k - $187.56k

     ...and more. Primary Purpose The Cloud Reliability Engineer III designs and implements cloud services...  ...and/or Kubernetes and/or OpenShift (infrastructure perspective) Azure Storage Account,...  ...our brands to better care for their customers. We thrive on supporting great local... 
    Customer
    Full time
    Work experience placement
    Work at office
    Local area
    Remote work
    Flexible hours

    Loyalty360

    Salisbury, NC
    4 days ago
  •  ...in Washington, DC, is looking for an experienced Senior Site Reliability Engineer to enhance the reliability and operational performance of IT...  ...to be part of a vibrant team dedicated to transforming customer approaches to complex challenges. #J-18808-Ljbffr Barbaricum
    Customer
    Contract work

    Barbaricum

    Washington DC
    18 hours ago
  •  ...decisions. Senior Dynamics 365 CE Application Engineer Plano, Texas | Yarmouth, Maine | Troy,...  ...deployment downtime for staff and customers. Corporate Attorney - State and...  ...potential compromises to clients’ network infrastructure. Revenue Accounting Intern Plano, Texas... 
    Customer
    Contract work
    Internship

    Tyler Technologies, Inc.

    Plano, TX
    3 days ago
  •  ...corrugated packaging business seeks to be the leader in helping our customers — large and small — package, transport and display products...  ...• Trust We are seeking resumes for an Electrical Reliability Engineer at our Gladstone, VA. Riverville Paper Mill. The... 
    Customer
    For contractors
    Local area

    Packaging Corporation Of America

    Gladstone, VA
    3 hours ago
  •  ...We are seeking resumes for an Electrical Reliability Engineer in our Valdosta, GA. Paper Mill. The Electrical Reliability Engineer is responsible...  ...build sound relationships with both internal and external customers Ability to exhibit excellent communication, analytical,... 
    Customer
    For contractors

    Packaging Corporation Of America

    Valdosta, GA
    3 hours ago
  • $180k - $210k

     ...Location Type Remote Department Tech Engineering Compensation $180K - $210K • Offers...  ...multimodal AI. About the Role As an Infrastructure Engineer at TwelveLabs, you will design...  ...tenant architectures for global enterprise customers. Develop scalable CI/CD pipelines... 
    Customer
    Remote job
    Full time
    H1b
    Work at office
    Worldwide
    Visa sponsorship
    Flexible hours

    Twelve Labs

    San Francisco, CA
    18 hours ago

Do you want to receive more vacancies?

Subscribe and receive similar vacancies to Customer Reliability Engineer - Infrastructure. Be the first to apply!