Customer Reliability Engineer - Infrastructure
$125k - $130kAstronomer
Astronomer empowers data teams to bring mission-critical software, analytics, and AI to life and is the company behind Astro, the industry-leading unified DataOps platform powered by Apache Airflow®. Astro accelerates building reliable data products that unlock insights, unleash AI value, and powers data-driven applications. Trusted by more than 800 of the world's leading enterprises, Astronomer lets businesses do more with their data. To learn more, visit About this role The Astronomer Customer Reliability Engineering (CRE) team is responsible for the success of our customers' usage of our managed Airflow service. The CREs are responsible for operating, monitoring, and maintaining the platform to ensure availability, predictability, and reliable operations. As an infrastructure specialist within the team, you will focus on the reliability of the underlying cloud infrastructure and Kubernetes clusters. This entails responding to incidents either raised by a customer, or from our monitoring system and then taking further steps to ensure problems are permanently resolved or monitored. As owners of the observability platform, CRE has unlimited potential to improve the reliability of the product and deliver the best possible outcome for our customers. This role is directly customer-facing and gives exposure to very diverse problems and requirements. CRE get the opportunity to interface with customers from a variety of industries across different cloud providers, and all with different expectations. Your contributions will directly impact customers' success with using the Astronomer products, and you will be able to help make meaningful improvements to the customer experience. What you get to do: Provide solutions to customers to make them successful using our products. Troubleshoot customer environments and engage in active triaging with customers Participate in on-call rotation for weekend coverage Provide feedback to the product development teams on customer needs and pain points. Build out our monitoring and alerting systems. Build and maintain automation to ensure daily operational tasks are handled as efficiently as possible. Help direct the architecture of the products and contribute where possible. Own the customer experience, working directly with customers to prioritize and solve issues, meet SLAs, and provide “white glove” guidance on the path to production. Participate remotely within a fully distributed team. Enhance and enrich customer documentation Work with the latest technology and multi-cloud implementations What you bring to the role: 5 years of experience, preferably with large, complex cloud infrastructures operating at scale 3 years of experience with Kubernetes Experience managing a Production distributed system with at least one major cloud provider (one or all: AWS, GCP, Azure) Strong Linux experience Knowledge of how to operate and monitor issues for distributed systems Previous experience in handling customers issues (internal or external) Strong communication skills DevOps or CI/CD experience Python scripting Good troubleshooting Skills Bonus points if you have: Experience as a Site Reliability Engineer Worked with Kubernetes Custom Resources Depth of knowledge with Azure Airflow/Big Data Orchestration experience IaC experience The estimated total compensation for this role ranges from $125,000 - $130,000 based on leveling and geography, along with an equity component and a comprehensive benefits package. This range is merely an estimate; actual compensation may deviate from this range based on skills, experience, and qualifications.
- LI-Fulltime
- LI-Remote
- ...Site Reliability Engineer (Edge Services), Infrastructure Services Austin, Texas, United States Software and Services We are seeking a proactive Site Reliability... ...are identified and mitigated before they impact the customer. Minimum Qualifications Understanding of Linux...CustomerShift work
- Site Reliability Engineer - AI Infrastructure Location: Global Remote / San Francisco · Full-Time About Andromeda Andromeda Cluster was founded by Nat... ...configure, and operate Kubernetes-based clusters for customers across multiple providers. Build automation and...CustomerFull timeRemote work
- At NVIDIA, Site Reliability Engineering provides a rare chance to define, develop, and support large... ...drive the adoption of actionable, customer‑centric monitoring and alerting. Apply... ...performant, and supportable. Background with infrastructure automation. Experience running...Customer
- Senior Site Reliability Engineer - AI Infrastructure Location: Global Remote / San Francisco • Full-Time About Andromeda Andromeda Cluster was founded... ...training and inference, working directly with customers pushing the limits of modern AI systems. We’re looking...CustomerFull timeRemote work
- Sr Reliability / Quality Engineer, Data Center Infrastructure Products Company Overview: Fleet is a team of digital infrastructure experts dedicated to accelerating... ...constructs data centers at these parks, providing customers with the one-stop shop for timely top-class data...CustomerContract workWork at office
- ...About the role Anyscale is looking for a Senior Site Reliability Engineer to join the Infrastructure team. Anyscale aims to provide the next generation of... ..., while also delivering high-impact features for our customers. Snapshot of projects you may work on Design, build...Customer
$81.2k - $134.3k
...technical teams, and fulfilling user requests. The ideal candidate should have Linux/Unix systems administration experience, excellent customer service skills, and the ability to handle complex environments efficiently. This position offers a competitive salary ranging from...Customer- ...websites and other Internet properties for customers ranging from individual bloggers to... ...You will join a team of talented network engineers who are building software solutions to... ...3 years of relevant Network/Site Reliability Engineering experience BA/BS in Computer...CustomerLocal area
- ...steps. Our partner is looking for a Site Reliability Engineer (Google Cloud Platform) based in the... ...reliable, scalable, and compliant infrastructure using automation-first practices and... ...ready documentation. Collaborate with customers, engineering teams, and stakeholders...CustomerRemote jobFull time
$95k - $130k
...Epsilyte is seeking a results-driven Reliability Engineer to lead and continuously improve the... ...company of scale focused on solving customer needs for efficient, high R-value EPS... ...packaging technology, and participating in infrastructure investment both in the United States...CustomerTemporary workFor contractorsWork at officeLocal areaImmediate start- ...responsible for monitoring and maintaining critical network infrastructure. This role ensures high availability of network systems while... ...troubleshooting network issues, coordinating with service providers, and providing technical support to customers. #J-18808-Ljbffr MSRcosmos LLCCustomerNight shift
- Balls Food Stores is looking for a highly motivated Infrastructure and Database Administrator to improve processes and support a broad technology... ..., networking, and automation with a focus on improving systems for both teammates and customers. #J-18808-Ljbffr Balls Food StoresCustomer
$133.1k - $306.4k
Senior Manager, Network Reliability Engineering Job Identification 336557 Job Category Product... ...across distributed systems, network infrastructure, and highly available services. Improve... ...and resilient services for our customers. Participate in the manager on‑call...CustomerTemporary workFlexible hours- ...seeking a highly experienced Principal Engineer to lead strategy, planning, build... ...external vendors to resolve complex reliability and infrastructure issues impacting product quality and... ...way to embrace the diversity of our customers and communities is to mirror it from...CustomerTemporary workRemote workFlexible hoursShift work
- ...strong investor support and early customer traction, our team is composed of... ...this Role We are seeking talented engineers intent on changing the security industry... ...on fast‑moving teams, scaling infrastructure with an eye towards reliability, and relentlessly optimizing: we...Customer
$106.13k - $119.72k
...maintenance history to develop and deploy engineering solutions, improved maintenance... ...preventative maintenance optimization, and other reliability techniques. · Provides technical... ...· Task Management · Strategic Skills · Customer Focus · Self-awareness · Management & Leadership...CustomerFull timeWork at office$93.61k - $119.72k
...compliance with regulatory requirements and ATS policies and procedures. Partners with internal/external customer for engineered solutions to improve reliability and throughput. Identifies opportunities for Capital Expenditures for equipment replacement with supervision...CustomerFull timeWork at office- ...over $37M since our founding in 2023. Our customers include companies like Microsoft,... ...run millions of sandboxes. Today our infrastructure runs on Nomad and Terraform across Google... .... We're looking for an infrastructure engineer who actually wants to live in Terraform...CustomerLive inWork from home
$158.2k - $200.7k
Cisco Systems, Inc. is looking for a Customer Reliability Engineer (CRE) who will be pivotal in interacting with customers. This role involves utilizing Site Reliability Engineering best practices to support the Isovalent software suite across various cloud providers. The...CustomerRemote job$160k - $215k
...deploy our sensors into our customers' field operations,... ...employees (including software engineers) to visit customer sites —... ...build the observability and reliability foundations that let us run... ...be our first full‑time SRE/infrastructure engineer , so we’ll look to...CustomerFull timeWork experience placementWork at office2 days per week- Job Category : IT Infrastructure Engineering Note: This is a hybrid position located in Brooklyn,... ...inefficiencies and eliminate toil to improve the reliability of Medical Mutual’s systems.... ...cash rewards for shopping with our customers. Excellent Benefits and...CustomerWork at officeRemote work3 days per week
- ...SuperPOD built on Grace Blackwell infrastructure — one of the fastest private... ...under real-world scale, reliability, and security demands — and we're looking for an engineer who wants to own the foundation... ...we grow. We have real paying customers and a playbook, and we still...Customer
- Hydra Host, Inc. is seeking a Site Reliability Engineer to enhance QA systems and service delivery. You will collaborate with various teams to... ...DevOps workflows. Join a remote team dedicated to operational excellence and customer success. #J-18808-Ljbffr Hydra Host, Inc.CustomerRemote job
- A leading tech recruiting firm is seeking a Site Reliability Engineer to manage and optimize cloud infrastructure primarily using GCP or AWS. The role involves maintaining... ...practices. This internal-facing role requires no customer interaction and focuses on improving platform...Customer
$125.04k - $187.56k
...and more. Primary Purpose The Cloud Reliability Engineer III designs and implements cloud services... ...and/or Kubernetes and/or OpenShift (infrastructure perspective) Azure Storage Account,... ...our brands to better care for their customers. We thrive on supporting great local...CustomerFull timeWork experience placementWork at officeLocal areaRemote workFlexible hours- ...in Washington, DC, is looking for an experienced Senior Site Reliability Engineer to enhance the reliability and operational performance of IT... ...to be part of a vibrant team dedicated to transforming customer approaches to complex challenges. #J-18808-Ljbffr BarbaricumCustomerContract work
- ...decisions. Senior Dynamics 365 CE Application Engineer Plano, Texas | Yarmouth, Maine | Troy,... ...deployment downtime for staff and customers. Corporate Attorney - State and... ...potential compromises to clients’ network infrastructure. Revenue Accounting Intern Plano, Texas...CustomerContract workInternship
- ...corrugated packaging business seeks to be the leader in helping our customers — large and small — package, transport and display products... ...• Trust We are seeking resumes for an Electrical Reliability Engineer at our Gladstone, VA. Riverville Paper Mill. The...CustomerFor contractorsLocal area
- ...We are seeking resumes for an Electrical Reliability Engineer in our Valdosta, GA. Paper Mill. The Electrical Reliability Engineer is responsible... ...build sound relationships with both internal and external customers Ability to exhibit excellent communication, analytical,...CustomerFor contractors
$180k - $210k
...Location Type Remote Department Tech Engineering Compensation $180K - $210K • Offers... ...multimodal AI. About the Role As an Infrastructure Engineer at TwelveLabs, you will design... ...tenant architectures for global enterprise customers. Develop scalable CI/CD pipelines...CustomerRemote jobFull timeH1bWork at officeWorldwideVisa sponsorshipFlexible hours
Do you want to receive more vacancies?
Subscribe and receive similar vacancies to Customer Reliability Engineer - Infrastructure. Be the first to apply!
- network reliability engineer United States
- database reliability engineer United States
- principal reliability engineer United States
- reliability maintenance engineering technician United States
- reliability engineering manager United States
- sr reliability engineer United States
- maintenance & reliability engineer United States
- reliability engineer United States
- senior reliability engineer United States
- entry level infrastructure engineer United States





