Staff Observability Platform Engineer (SRE)
$118.45k - $236.9kOak St. Health
Lead Platform Reliability Engineer
We're building a world of health around every individual — shaping a more connected, convenient and compassionate health experience. At CVS Health®, you'll be surrounded by passionate colleagues who care deeply, innovate with purpose, hold ourselves accountable and prioritize safety and quality in everything we do. Join us and be part of something bigger – helping to simplify health care one person, one family and one community at a time.
CVS Health PBM is looking for hands-on, passionate people who want to join a high energy and growing team, who want to be on the forefront of digital innovation that aims to reinvent what a pharmacy and a health care company can be in the digital world.
As a Lead Platform Reliability Engineer, you will design and implement metrics and observability frameworks with a strong focus on service level objectives (SLOs), service level indicators (SLIs), error budgets, and cloud infrastructure scaling and capacity estimation. This individual contributor role is critical to enhancing our monitoring and observability capabilities, while also driving automation initiatives related to quality gates within the release engineering process. You will work closely with cross-functional teams to ensure the reliability, performance, and scalable growth of our cloud-based systems.
Expectations for the Role:
Metrics Development: Define, implement, and maintain key performance metrics, SLOs, and SLIs to measure system reliability and performance. Ensure alignment with business objectives and operational goals.
Error Budgets: Manage error budgets effectively, collaborating with development teams to balance reliability and feature delivery. Analyze incidents and outages to inform adjustments to error budgets.
Monitoring & Observability: Design and implement comprehensive monitoring solutions to provide real-time visibility into system health. Utilize tools such as Prometheus, Grafana, Loki, Temp and other observability platforms to create dashboards and alerts.
Cloud Infrastructure Scaling: Architect, design, and implement scalable cloud infrastructure capable of supporting multiple business applications, ensuring reliability, performance, and future growth.
Quality Gates Automation: Develop and implement automated quality gates that ensure all releases meet defined reliability and performance standards. Lead the release Devops team to integrate these gates into the CI/CD pipeline.
Incident Management: Assist in incident response efforts by providing insights from metrics and monitoring tools. Conduct post-mortem analyses to identify root causes and recommend preventive measures.
Required Qualifications
- 10+ years of experience in Software Engineering, Platform Engineering, or SRE.
- 7+ years of experience with observability practices, including SLIs/SLOs/SLAs, alerting, and incident management.
- 7+ years building production-grade backend services in Java/python.
- 7+ years implementing and operating OpenTelemetry, including OTLP, semantic conventions, and instrumentation patterns.
- 7+ years with cloud-native and containerized platforms (Docker, Kubernetes, Argo CD).
- 7+ years working with public cloud platforms (AWS, GCP, or Azure).
- 5+ years designing and scaling distributed, high-volume data pipelines.
- 5+ years working with Grafana OSS or comparable observability backends (e.g., Grafana, Loki, Tempo, Prometheus).
- 5+ years with relational databases (PostgreSQL, MySQL).
Preferred Qualifications
- Excellent analytical skills and the ability to communicate complex technical concepts to non-technical stakeholders
- Experience with service meshes and networking technologies such as Envoy and Istio
- Experience integrating or operating commercial observability platforms (Splunk, AppDynamics, etc.)
- Experience with streaming and data platforms such as Kafka, Pulsar, or similar technologies
- Familiarity with time-series, NoSQL, or analytical databases (ClickHouse, Bigtable, Cassandra, etc.)
- Experience with Infrastructure as Code tools such as Terraform or CloudFormation
- Experience with cost optimization and capacity planning for large-scale cloud infra
- Experience with chaos engineering, resiliency testing, or fault injection
- Background in security-aware platform design, including secure service-to-service communication
- Experience mentoring senior engineers and influencing platform standards across organizations
- Strong operational experience supporting 24x7 production systems, including on-call responsibilities
- Knowledge of security best practices in cloud environments
Bachelor's degree or equivalent experience (HS diploma + 4 years relevant experience)
The typical pay range for this role is: $118,450.00 - $236,900.00. This pay range represents the base hourly rate or base annual full-time salary for all positions in the job grade within which this position falls. The actual base salary offer will depend on a variety of factors including experience, education, geography and other relevant factors. This position is eligible for a CVS Health bonus, commission or short-term incentive program in addition to the base pay range listed above. This position also includes an award target in the company's equity award program.
Our people fuel our future. Our teams reflect the customers, patients, members and communities we serve and we are committed to fostering a workplace where every colleague feels valued and that they belong.
Great benefits for great people. We take pride in offering a comprehensive and competitive mix of pay and benefits that reflects our commitment to our colleagues and their families. This full-time position is eligible for a comprehensive benefits package designed to support the physical, emotional, and financial well-being of colleagues and their families. The benefits for this position include medical, dental, and vision coverage, paid time off, retirement savings options, wellness programs, and other resources, based on eligibility.
- ...* Role Overview We are looking for a highly skilled SRE Engineer with strong hands-on experience in monitoring, production support... ...The role requires strong expertise in Core Java , modern observability tools, and hands-on experience with databases, CI/CD, cloud...SuggestedLocal area
- ...Must Have Skills: SRE Observability skills GCP/Azure cloud Additional Skills Required: SRE skills DevOps GCP/Azure cloud Grafana, Prometheus, Loki open source tools Data Dog, Splunk monitoring tools Job Summary: We are seeking...Suggested
$286.2k - $326.7k
...Sr. Distinguished Engineer, Acquisitions Platform & SRE Foundations As a Sr. Distinguished Engineer at Capital One, you will be a part of a community of technical experts working to define the future of banking in the cloud. You will work alongside our talented...SuggestedFull timePart timeLocal areaRemote work- ...is seeking an experienced Lead Software Engineer within Technology Engineering to design... ...lead scalable backend applications and platform services. This role requires deep... ...Design and develop scalable telemetry, observability, and analytics solutions to support real...SuggestedWork experience placement
- ...DevOps Engineer ***Google Cloud Platform and Splunk Observability Cloud required*** Work Location Options: Onsite from day 1 Phoenix, AZ - onsite Charlotte... ...destructive and resiliency testing Automate key SRE metrics and IT Service Operations processes...Suggested
$79.2k - $178.1k
...Description Role Summary Oracle Health Platform Engineering builds core platform capabilities that... ...best practices (testing, CI/CD, observability, security). • Diagnose and resolve... ...Collaborate with cross-functional stakeholders (SRE/Operations, Security, Product, and...Temporary workVisa sponsorshipFlexible hours$100k - $125k
...Role - SRE Engineer Experience Required - 3+ Years Must Have Technical/Functional Skills Core Java, Splunk, Kibana, Grafana • Databases: Postgres, MongoDB • Experience in Production support engineering or SRE roles, preferably within the banking industry...- ...Hello, Job Title: (Site Reliability/Observability Engineer (SREs).) Phoenix, AZ Job Description: Objectives of this... ...of system health. Build software and systems to manage platform infrastructure and applications. Improve reliability,...
$100k - $125k
...Role - SRE with Data Engineer Experience Required - 8+ Years Must Have Technical/Functional Skills • In-depth knowledge of the... ...and data integration techniques. • Experience with cloud platforms and big data tools (e.g., Google Big Query). • Strong analytical...- ..., automating, and maintaining security platforms that support enterprise cybersecurity operations... ...cloud experience blended with platform engineering capabilities to mature the AI Security... ...and enable real-time monitoring for observability. • Partner with incident response...Immediate startRemote workFlexible hours
$120k - $135k
...the culture. What You'll Be Doing: As a member of the Platform Engineering organization, you will be part of a team responsible for managing... ...Network Engineer within our Site Reliability Engineering (SRE) organization, you'll play a pivotal role in building a secure...Immediate start$65 - $75 per hour
...Platform Engineer – TEKsystems We are seeking a Platform Engineer to help build and scale our Kubernetes-based infrastructure. This... ...native solutions, managing infrastructure as code, and ensuring observability across systems. You’ll work closely with our engineering...Hourly payContract workTemporary workRemote work$145.6k - $209.3k
...their days with our workforce operating platform. Helping people get paid, grow in their... ...a Principal Cloud Platform Software Engineer in Enterprise Solutions and Experience... ...automated deployment frameworks. Implement observability, monitoring, and logging solutions to...Local area$197.4k - $232k
...Remote Department Engineering Compensation: $197.4K -... ...data doesn't sit still. Our platform puts information in motion,... ...secure execution environments Observability & Operations: Drive operational... ...product management, SRE, and other engineering teams...Full timeRemote work$186.07k - $225k
...every day, as we build the emerging onchain platform — and with it, the future global... ...for a Senior Machine Learning Platform Engineer to join our Machine Learning Platform team... ...large volumes of data. Build tooling to observe the quality of data going into our...Local area$100k - $109.5k
...communities we serve. The Senior Cloud & Backend Cloud Platform Engineer is responsible for designing, building, and operating cloud... ...experience with GCP (Cloud Run or GKE, IAM, networking, CI/CD, observability, secrets management). ~ Working knowledge of AWS (IAM,...Night shift$99.6k - $223.4k
...the next generation of cloud-native EHR platforms that directly improve clinical outcomes. We're looking for senior engineers with deep Java expertise, exceptional debugging... ...design for scalability, reliability, and observability Stay hands-on with coding while...Full timeTemporary workRemote workFlexible hours- ...Senior AI Platform Engineer, Atlas AI USA (Phoenix) What Cognite Is: Relentless to Achieve Cognite operates at the forefront of... ...performance vs. cost) for a given task. Implement evaluation and observability for all AI services. Create standardized frameworks for...
$40 per hour
...hospitality industry around the world! As a Lead Cloud AI Platforms Engineer , you will bring your technical skills to a hospitality company... ..., data engineering, AI and ML security, logging and observability. How We'll Help You Thrive At Hilton, the hospitality...Work experience placementRemote workWorldwideNight shift$200.72k - $222.68k
...Qualifications Bachelor's degree in Engineering, plus a minimum of 10 years of relevant... ...this Position What You'll Own The platform architecture. You will define the... ...CI/CD practices, testing expectations, observability requirements, operational toil automation...Flexible hoursDay shift$94.9k - $135.6k
...aligning development, testing, operations, and platform teams to deliver value safely and... ...Cardinal Health is seeking a Release Engineer to lead iteration and release management... ...Owners, Scrum Masters, Engineering, Testing, SRE, and Operations to align scope, sequencing...Temporary workLocal areaImmediate startFlexible hours- ...Lead Data Platform Engineer Virtuous is evolving its data platform into an AI-ready foundation that powers trusted decision-making and... ...or natural-language data access. Familiarity with data observability, lineage, or metadata tooling. Experience designing platforms...Immediate startFlexible hours
- ...to help build a hyper-scaling platform serving millions and want... ...We’re hiring a Senior DevOps Engineer to scale, harden, and automate... ...Improve system reliability, observability, and release consistency.... ...Qualifications ~5–8+ years in DevOps/SRE/platform engineering roles....Remote work
$79.2k - $178.1k
...Job Description Oracle Health is seeking an AI Platform Reliability Engineer to ensure our AI agent platform and AI-enabled analytics workflows are reliable, observable, measurable, and safe in production. This role will focus on the operational foundation for production...Temporary workFlexible hours- ...DevOps Engineer III Scottsdale, AZ LodgeLink is inviting a DevOps... ..., secure, and reliable platforms across our multi-cloud environments... ...ecosystem and our observability-first mindset. The DevOps... ...years of experience in DevOps, SRE, or Platform Engineering roles...Permanent employmentRemote work
$85.4k - $192.9k
...and experienced Senior DevOps Engineer to take a leading role in... ...Site Reliability Engineering (SRE), and genuinely excited to learn... ...tools. Utilize AI-driven observability for anomaly detection, predictive... ...running on modern platforms like Cloud Run, Kubernetes (GKE...Immediate startRemote workRelocation packageFlexible hours$104.9k - $174.7k
...Senior Site Reliability Engineer (SRE) About the Business: LexisNexis Risk Solutions... ...infrastructure, writing Terraform, improving observability, and responding to real production... ...operating monitoring and uptime platforms such as Grafana, Pingdom, and Uptrends...Full timeWork at officeLocal areaRemote workFlexible hours$79.1k - $158.2k
...Oracle Health Data, Analytics Platform. This team will focus on... ...contribution to make it a world class engineering center with the focus on... ...Site Reliability Engineer (SRE), you will own shared,... ...consumers ~ Experience designing observability and capacity models for...Temporary workImmediate startFlexible hours$58.8k - $156.7k
...Site Reliability Engineer - Local to Phoenix, AZ Category: Software... ...resilience of our critical platforms-spanning mainframe, ETL... ...of monitoring, alerting, and observability . Strong understanding of... ...QualityTroubleshootTechniques SRE (Site Reliability Engr.) TeamPlayer...Permanent employmentFull timeLocal area$109.2k - $223.4k
...complexity increase, OCI depends on hardware platforms that are both innovative and deployable... ...gaps impacting scale Telemetry & Observability Define telemetry requirements (power... ...Leadership Collaborate across OHD, engineering, operations, supply chain, and software...Temporary workFlexible hours
Do you want to receive more vacancies?
Subscribe and receive similar vacancies to Staff Observability Platform Engineer (SRE). Be the first to apply!
- assistant engineer Scottsdale, AZ
- engineering aide Scottsdale, AZ
- staff engineer Scottsdale, AZ
- technology administrator Scottsdale, AZ
- platform developer Scottsdale, AZ
- platform engineer Scottsdale, AZ
- digital platform specialist Scottsdale, AZ
- platform product manager Scottsdale, AZ
- platform manager Scottsdale, AZ
- staff security engineer

