Senior Site Reliability Engineer, Observability [Remote]
Chainlink Labs
- Remote job
Many of the world’s largest financial services institutions have also adopted Chainlink’s standards and infrastructure, including Swift, Euroclear, Mastercard, Fidelity International, UBS, S&P Dow Jones Indices, FTSE Russell, WisdomTree, ANZ, and top protocols such as Aave, Lido, GMX and many others. Chainlink leverages a novel fee model where offchain and onchain revenue from enterprise adoption is converted to LINK tokens and stored in a strategic [Chainlink Reserve]( Learn more at [chain.link](
The Observability Team enables Chainlink development and empowers engineers to continue building and supporting crucial products and services that have a profound impact in the blockchain industry. Reliability is vital to the success of our company. As a Senior SRE, you will help us accelerate and enable other engineering teams by increasing self-service and decreasing cognitive load.
This job would be perfect for someone who has a strong DevOps mentality, is passionate about building and maintaining a mature GitOps environment, and has experience focusing on observability. The entire engineering team is expanding, and you would have plenty of opportunities to build, learn, and grow.
We all have different backgrounds and are determined to help you succeed no matter where you are or who you are. If you think you would do a great job at Chainlink, we are looking forward to speaking with you, even if you don't match 100% of the job requirements: those describe people we've usually had a great time working with, but they're not a tick-box exercise.
Your Impact- Build and orchestrate Modern OTEL-based Observability Platform
- Support multiple telemetry types, like metrics, logs and traces.
- Define and support modern governance in observability and problems at scale.
- Ensure reliability, security, and performance exceed our defined SLAs
- Work with engineers from across the company to help troubleshoot issues, deploy new products and services, and increase velocity while decreasing cognitive load
- Lead the design and deployment of monitoring/observability services to detect and alert the team of needed action.
- Ingest, aggregate, transform, and utilize data from a multitude of sources in our real time data pipeline.
- Oversee the availability, performance, and supportability of our observability infrastructure.
- Create processes around alert response operations and support the team to ensure the reliable delivery of oracle data.
- Make recommendations to ensure sufficient metrics are collected to create alerts with every new feature release.
- Champion reliability and security by taking the time to do your work right the first time
- 7+ years of relevant professional experience. You probably have worked on a devops, infrastructure, SRE, and/or platform team before
- Ability to develop software outside of the scope of typical infrastructure requirements and configurations
- Experience programming in C, C++, Java, Python, Go, Perl, or Ruby
- Expert knowledge in all aspects of designing, developing, and managing large real-time systems
- Experience with monitoring and logging. You know how to export metrics using Prometheus, have built a Grafana dashboard or two, and have experience with a centralized logging solution like an ELK Stack, Splunk or Grafana Stack.
- Experience with distributed systems and container orchestration. You have maintained or even built Kubernetes clusters before and feel comfortable deploying completely new services on them
- Strong communication skills. You can give and receive constructive feedback, and you do not shy away from planning meetings and code reviews
- Excitement for blockchain, Web 3.0, and similar decentralized technologies.
- Experience running any infrastructure in the blockchain/web3 space
- Ability to scale systems sustainably through mechanisms like automation, and evolve systems by pushing for changes that improve reliability and velocity
- Experience working remotely in a distributed team
- A strong desire to grow and challenge yourself. We would expect you to constantly find ways to improve and automate services to reduce toil
- AWS; Terraform/Terragrunt; Kubernetes, Calico and ArgoCD; Prometheus and Grafana; GitHub Actions; Packer
- We expect you to be comfortable with most of those tools and very proficient in several of them.
Chainlink Labs is an equal opportunity employer. All qualified applicants will receive equal consideration for employment in compliance with applicable laws, regulations, or ordinances. If you need assistance or accommodation due to a disability or special need when applying for a role or in our recruitment process, please contact us via this [form](
Global Data Privacy Notice for Job Candidates and ApplicantsInformation collected and processed as part of your Chainlink Labs Careers profile, and any job applications you choose to submit, is subject to our [Recruiting Privacy Policy]( By submitting your application, you are agreeing to our use and processing of your data as required.
- jobr.pro is seeking a Senior Site Reliability Engineer in New York, NY, to enhance platform reliability and engineering excellence. You will be instrumental in implementing observability, security, and CI/CD practices. This role involves coaching teams and optimizing workflows...Senior
$160k - $200k
Ripple is seeking a Senior Site Reliability Engineer in Chicago. In this role, you will enhance platform reliability by embedding with engineering teams and coaching them on CI/CD practices, observability, and application security. Your expertise will help us redefine...Senior$160k - $200k
Ripple in Chicago is seeking a Senior Site Reliability Engineer to enhance product reliability and performance. In this role, you will engage with engineering teams to implement observability practices and optimize CI/CD pipelines, ensuring robust security. The position...Senior- A modern observability platform located in the Boston area is seeking a skilled Site Reliability Engineer to join their Cloud Infrastructure Team. This role involves managing high-scale environments, collaborating with R&D to improve system stability, and performing operational...Senior
- ...increases, they are expanding their SRE function to improve reliability, scalability, and performance across their cloud-native environment... ...Improving system reliability across AWS environments Driving observability improvements Automating infrastructure recovery and scaling...Senior
- ...home day is currently Tuesday. Engineering at Lambda is responsible for... ...’ll Do Deploy and operate observability platforms for logging,... ...adoptable and improve product reliability. Lead members of other engineering... ...5+ years of experience in Site Reliability Engineering...SeniorWork at officeLocal areaWork from home
- About the Role We are looking for a Senior SRE to join our Platform Engineering team as the operations owner of our observability platforms. You’ll be responsible for the reliability, scalability, and continued evolution of the tools that give our engineering organization...Senior
- Koitecc Solutions is seeking a Site Reliability Engineer (SRE) in Scottsdale, AZ to ensure the reliability and performance of the myPBM platform... .... This hybrid role involves implementing automation and observability practices while collaborating with cross-functional teams...SeniorFull time
- Axon in Seattle is seeking a Senior Engineer for its observability team. You'll design and evolve the observability platform, working on distributed tracing, logging, and metrics across Axon's infrastructures. The ideal candidate has strong engineering experience, ideally...Senior
- 4p-Consulting-Inc. is looking for an experienced DevOps Engineer IV / Site Reliability Engineer (SRE) in Atlanta, GA. This professional will focus on observability, telemetry, and service reliability, working with engineering and operations teams to enhance operational...Senior
$198.03k - $287.95k
Calendly is looking for a Site Reliability Engineer to enhance its innovative infrastructure platform. This role will empower teams by enabling best practices in monitoring and optimizing resources. The ideal candidate will have robust experience with cloud technologies...Senior- Hitachi Vantara Corporation is looking for a Site Reliability Engineer (SRE) to design and operate the enterprise observability stack, including Azure Monitor and Managed Grafana. This position requires extensive experience in SRE and cloud infrastructure, with a focus...Senior
$92.7k - $203.94k
CVS Health in Richardson, TX, seeks a Site Reliability Engineer responsible for ensuring the reliability and performance of the myPBM platform. The role focuses on automation, incident management, and improving delivery of client services. This position requires 5+ years...Senior- Koitecc Solutions is seeking a Site Reliability Engineer to ensure the reliability, performance, and scalability of the myPBM platform. This position involves working closely with DevOps, Engineering, and Security teams. The ideal candidate will have over 5 years of experience...Senior
- Senior SRE Engineer Azure Healthcare Observability Healthcare SaaS As we expand our customer deployments, we seek an... ...SRE / DevOps Engineer to ensure reliability, observability, and operational excellence... ...), CTO Technologies Must-have: Site Reliability Engineering (...SeniorImmediate startFlexible hours
- ...accelerated growth in the AI-driven world. We’re looking for a Senior Site Reliability Engineer to help build and scale a high-impact SRE function. You... ...to guide engineering priorities Design and develop observability systems (metrics, logging, tracing, alerting) that...Senior
- ...-mortems to improve the shared goal of reliability across services* Transform operations teams... ...operating cloud infrastructure with senior‐level impact.* 5+ years building and... ...standardization across the ecosystem for observability, APM and infrastructure monitoring, and...3 days per week
- ...APPIT Software Solutions is hiring a Senior Site Reliability Engineer (SRE) in Seattle, USA . Lead site reliability engineering efforts for large... ..., driving 99.99% availability targets through advanced observability, automation, and resilience engineering....SeniorFlexible hours
- ...match. The role We’re looking for a Senior SRE to own the reliability, scalability, and operational... ...Build and maintain CI/CD pipelines, observability stacks, and incident response workflows... ...development workflows Partner closely with engineering on reliability reviews and...Senior
- ...Site Reliability Engineers are responsible for ensuring the availability, reliability, scalability, and performance of the firm’s most critical... ...operations, with a strong emphasis on building systems that are observable, resilient, and operable by default. This is an on-site...SeniorLocal areaRemote workFlexible hoursShift work
- Quest Technology Management is looking for a proactive Site Reliability Engineer in Elk Grove, California. In this role, you will champion the... ...resilience and scalability of services. You will design observability and alerting strategies, automate workflows, and...
- ...public cloud platform? Join our IaaS Site Reliability Engineering (SRE) team. We design, develop, and... ...Terraform, and Ansible/Salt and lead observability initiatives (metrics, logging,... ...path in our SRE team: SRE I → SRE II → Senior → Senior II → Principal → Senior Principal...SeniorWork at officeRemote work
- ...Mango, Inc. Senior Site Reliability Engineer Los Angeles, CA·Full time We are seeking a Senior Site Reliability Engineer to own and evolve the infrastructure... ...robust automation that keeps our systems consistent and observable. Key Responsibilities Infrastructure Design &...SeniorFull time
$160k - $200k
Site Reliability Engineer, Observability Please note this is for Chicago, Illinois, United States. You only need toapply to one location if there are multiple... ...that powers the Internet of Value. THE WORK: As a Senior Site Reliability Engineer you will be a force...Full timeWork at officeLocal area- ...operations sector, is seeking a dedicated and skilled Senior Site Reliability Engineer to join their dynamic team. As a Senior Site Reliability... ...in collaboration with stakeholders. Develop and enhance observability, telemetry, and monitoring tools to ensure system...SeniorWeekly pay
$150k - $200k
...collaborative environment. About the Role We are looking for a Senior Site Reliability Engineer to help ensure the reliability, scalability, and... ...teams to improve service reliability, performance, and observability. Support incident response, root cause analysis, and...SeniorFull time$111k - $130k
QUEST DIAGNOSTICS INC is seeking a Performance II‑Epic to provide reliability engineering services through observability and performance engineering techniques. The role requires collaboration with product owners, ensuring optimal operation through monitoring system performance...Remote job- ...encourage you to apply. The Role As a Senior Platform Engineer, you are a champion for DevOps and... ...Will Be Doing Improving production reliability and system resilience within an SRE... ...preferred Experience operating a production observability stack (metrics, logs, traces), with...SeniorFlexible hours
$175k - $190k
...behalf of a partner company. We are currently looking for a Senior Site Reliability Engineer - AWS in United States. This role sits at the core of a... ...will play a key role in strengthening CI/CD pipelines, observability, and incident response practices. This is a highly...SeniorFull timeTemporary work$160k - $195k
...agencies fuels the RapidSOS HARMONY AI engine that delivers this intelligence to those... ...you excited to work on systems where reliability directly impacts real‑world outcomes? At... ...improve system behavior under stress. Build observability into system behavior: Proactively...SeniorLocal areaFlexible hours
Do you want to receive more vacancies?
Subscribe and receive similar vacancies to Senior Site Reliability Engineer, Observability [Remote]. Be the first to apply!
- site reliability engineer remote United States
- site reliability engineer United States
- lead site reliability engineer United States
- site reliability engineer sre United States
- site reliability engineering manager United States
- senior learning manager United States
- senior data management analyst United States
- senior app developer United States
- senior manager insurance United States
- senior game producer United States

