Sign up to access all features of our service.
  • Job search
  • Favorites
  • Create a CV
    New
  • Salaries
  • Subscriptions

Lead Observability Platform Engineer

United IT

Lead Observability Platform Engineer

Location: Remote

As a Lead Observability Platform Engineer, you will design, build, and operate large-scale observability services that process billions of logs, metrics, and traces daily. You will develop high-performance backend services using Go, Java, and Node.js, and lead the adoption of Open Telemetry-based instrumentation and standards across the enterprise.

In this role, you will partner closely with SRE, Cloud Engineering, CI/CD, Infrastructure, Security, and application teams to shape platform strategy, enhance developer experience, and ensure reliable, secure, and cost-efficient observability at scale. You will provide senior technical leadership, influence architectural direction, and help deliver a world-class, self-service observability ecosystem that accelerates engineering productivity and operational excellence.

Key Responsibilities
  • Design, build, and operate core observability platform services using Go, Java (Spring Boot), and Node.js.
  • Lead enterprise-wide adoption of OpenTelemetry, including client libraries, semantic conventions, instrumentation patterns, and Collector/agent strategy.
  • Architect and scale high-throughput, fault-tolerant telemetry pipelines (logs, metrics, traces) with a focus on performance, reliability, and cost efficiency.
  • Develop self-service observability capabilities that simplify onboarding, troubleshooting, and adoption for application teams.
  • Implement end-to-end monitoring of the observability platform itself, defining SLOs, health checks, and alerting.
  • Collaborate with SRE, Platform, and Cloud teams to establish reliability standards, error budgets, and incident response practices.
  • Participate in on-call rotations and lead incident mitigation, root-cause analysis, and post-incident reviews.
  • Automate operational workflows and eliminate manual toil through tooling, CI/CD enhancements, and platform automation.
  • Ensure secure telemetry pipelines through mTLS, secrets management, and zero-trust design patterns.
  • Produce and maintain high-quality technical documentation, standards, and best practices.
  • Engage with internal engineering teams to gather requirements, influence roadmap prioritization, and deliver platform improvements.
  • Provide technical leadership through mentorship, design reviews, architectural guidance, and cross-team collaboration with principal engineers and engineering leadership.
Required Qualifications
  • 7+ years of experience in Software Engineering, Platform Engineering, or SRE.
  • 5+ years of experience with observability practices, including SLIs/SLOs/SLAs, alerting, and incident management.
  • 5+ years building production-grade backend services in Go and/or Java.
  • 5+ years implementing and operating Open Telemetry, including OTLP, semantic conventions, and instrumentation patterns.
  • 5+ years with cloud-native and containerized platforms (Docker, Kubernetes, Argo CD).
  • 5+ years working with public cloud platforms (AWS, GCP, or Azure).
  • 3+ years designing and scaling distributed, high-volume data pipelines.
  • 3+ years working with Grafana OSS or comparable observability backends (e.g., Grafana, Loki, Tempo, Mimir).
  • 3+ years with relational databases (PostgreSQL, MySQL).
Preferred Qualifications
  • Experience with service meshes and networking technologies such as Envoy and Istio.
  • Experience integrating or operating commercial observability platforms (Datadog, New Relic, AppDynamics, etc.)
  • Experience with streaming and data platforms such as Kafka, Pulsar, or similar technologies.
  • Familiarity with time-series, NoSQL, or analytical databases (ClickHouse, Bigtable, Cassandra, etc.)
  • Experience with Infrastructure as Code tools such as Terraform or CloudFormation.
  • Experience with cost optimization and capacity planning for large-scale telemetry systems.
  • Experience with chaos engineering, resiliency testing, or fault injection.
  • Background in security-aware platform design, including secure service-to-service communication.
  • Experience mentoring senior engineers and influencing platform standards across organizations.
  • Strong operational experience supporting 24x7 production systems, including on-call responsibilities.
  • Strong technical communication and cross-team collaboration skills.
  • Experience operating in regulated or compliance-heavy environments (e.g., healthcare, finance).

Education: Bachelor's degree from accredited university or equivalent work experience (HS diploma + 4 years relevant experience).

Vacancy posted 1 day ago
Similar jobs that could be interesting for youBased on the Lead Observability Platform Engineer in United States vacancy
  • $230k - $270k

    LangChain is seeking a Principal/Lead Software Engineer based in Boston to drive the technical direction of their core platform. You will lead architectural decisions, mentor engineers, and ensure system reliability across their full stack. With 10+ years of experience... 
    Suggested
    Flexible hours

    LangChain

    Boston, MA
    2 days ago
  •  ...Infrastructure-as-Code Docker containers (JavaScript, Python) Grafana or observability tools SonarQube (code quality/security) JFrog Artifactory AI-assisted tools (e.g., GitHub Copilot) Internal Developer Platforms Qualifications Experience supporting enterprise DevOps or... 
    Suggested

    Apex Systems

    New York, NY
    2 days ago
  •  ...LLM Platform Engineer/Lead Remote from EMEA or Bangalore Hello I am Servesh, Co-founder and CTO at Kayzen, and I am now looking for...  ...Build reusable components for prompt management, evaluation, observability, and safety Define best practices for AI usage, cost... 
    Suggested
    Remote work
    Work from home
    Worldwide
    Home office
    Flexible hours

    Kayzen

    United States
    1 day ago
  •  ...VCF Platform Engineer Lead At HDR, our employee-owners are fully engaged in creating a welcoming environment where each of us is valued...  ...strategy across provisioning, compliance, patching, lifecycle, observability, and recovery operations. Drive platform integration with... 
    Suggested
    Full time
    Temporary work
    Part time
    Monday to Friday
    Shift work

    HDR

    Portland, OR
    4 days ago
  • $153.84k - $246.15k

     ...differences. We believe that belonging leads to better outcomes and a stronger...  ...learn new ones "I can succeed as a Platform Engineer Lead - Disaster Recovery and Resiliency...  ...configuration management, monitoring and observability, resyncing and reconciliation, and... 
    Suggested
    Temporary work
    Local area
    Flexible hours

    Capital Group

    Charlotte, NC
    1 day ago
  • $139.74k - $209.62k

     ...Platform Engineer Lead PLEASE NOTE: This position is not eligible for current or future visa sponsorship Location : This role requires...  ..., and agentic runtimes preferred. Experience with AI Observability is a huge plus (both GenAI and ML) Skills working with CI... 
    Temporary work
    Work experience placement
    Work at office
    Local area
    2 days per week
    1 day per week

    Elevance Health

    Atlanta, GA
    3 days ago
  • $153.84k - $246.15k

     ...differences. We believe that belonging leads to better outcomes and a stronger...  ...new ones "I can succeed as a Lead Platform Engineer at Capital Group." As a senior individual...  ...automation, security integration, observability, and AIassisted development-that reduce... 
    Temporary work
    Local area
    Flexible hours

    Capital Group

    Charlotte, NC
    1 day ago
  •  ...Systems Engineering Manager Lead identification of program objectives and technical strategies;...  ...system engineering projects and cloud platform initiatives. Lead and manage systems...  ...& Cloud-Native Design. Monitoring, Observability & Performance Optimization.... 

    Cynet Systems

    Reston, VA
    1 day ago
  •  ...Platform Engineer Location: 5 days onsite in Cleveland, Ohio (they have a relocation package if needed) Cannot submit anyone that has...  ...(e.g., HashiCorp Vault)· Experience deploying and managing observability tools, such as Sysdig for monitoring and CVE scanning, Fluentd... 
    Relocation package

    RIT Solutions

    Cleveland, OH
    2 days ago
  • W. R. Berkley Corporation is looking for a Senior DevOps Platform Engineer in Wilmington, Delaware. In this role, you will ensure the reliability...  ...enterprise CI/CD pipelines, and implement monitoring and observability solutions. Candidates should have 5+ years in DevOps and... 

    W. R. Berkley Corporation

    Wilmington, DE
    4 days ago
  • $131.4k - $243.8k

     ...Sr Mgr.* Mission Statement The Observability & Middleware Platforms team within the Platform & Reliability Engineering organization empowers Securian's technology...  ...mainframe, and observability backbone. This role leads a team of Engineers in defining and... 
    Work at office
    Flexible hours
    3 days per week

    Securian Financial

    Saint Paul, MN
    9 days ago
  •  ...Position: Senior NDR & Platform Observability Engineer Location : Remote Senior NDR & Platform Observability Engineer will support the operational health, visibility, and performance of the enterprise Network Detection & Response (NDR) environment, with a primary... 
    For contractors
    Remote work

    Futran Tech Solutions Pvt. Ltd.

    United States
    12 hours ago
  •  ...Vantara Corporation is looking for a Site Reliability Engineer (SRE) to design and operate the enterprise observability stack, including Azure Monitor and Managed...  ...required, and the role involves defining SLOs and leading incident responses. #J-18808-Ljbffr Hitachi Vantara... 

    Hitachi Vantara Corporation

    Chicago, IL
    12 hours ago
  •  ...admired brands, Toyota is growing and leading the future of mobility through innovative...  ...collaborative environment. DevOps/Platform Engineer, Security Intelligence Location:...  ...the platform is reliable, secure, observable, and cost-efficient as it scales. You... 

    Toyota Motor Sales, U.S.A., Inc.

    Plano, TX
    3 days ago
  •  ...A technology company based in the United States is seeking a Sr. Platform Engineer to manage AWS, GCP, and cloud infrastructure. In this role, you will plan monitoring and observability mechanisms, develop tooling in Rust, and ensure operations meet reliability standards... 
    Remote work
    Flexible hours

    3Box Labs

    New York, NY
    2 days ago
  •  ...We're seeking a Senior Platform Engineer (Observability & Telemetry) to join a high-performing Monitoring Engineering team within a fast-paced financial...  ...for the Enterprise Monitoring Center using industry-leading tools, such as: Grafana, OpsRamp, ElasticStack, BigPanda... 

    OneMain Financial

    Baltimore, MD
    3 days ago
  • $118.45k - $236.9k

     ...Lead Platform Reliability Engineer We're building a world of health around every individual — shaping a more connected, convenient and compassionate...  ...Engineer, you will design and implement metrics and observability frameworks with a strong focus on service level... 
    Hourly pay
    Full time
    Temporary work

    Oak St. Health

    Scottsdale, AZ
    1 day ago
  • $124k - $156k

     ...Insight Software is seeking a Principal Software Engineer for the Platform Services team in the United States. The role involves overseeing the reliability and observability of the Certent Equity Management platform, focusing on cloud-native modernization. Candidates... 

    insightsoftware

    New York, NY
    2 days ago
  •  ...innovative, scalable, and secure platforms and services for patients,...  ...platform. As a Software Engineer, you must possess world class...  ...delivering real value to customers. Lead the evolution of our platform...  ...Build and operate reliable, observable systems, ensuring high... 
    Flexible hours

    DBA-Verwaltungs-Gmbh

    San Diego, CA
    2 days ago
  •  ...Staff Platform Engineer | Observability Brazil (Remote) Your wellbeing, our mission. Join a company shaping a healthier world. At Wellhub we're revolutionizing workplace wellness. Our platform connects employees worldwide to the best partners for fitness, mindfulness... 
    Part time
    Remote work
    Worldwide
    Home office
    Flexible hours

    Wellhub Inc.

    United States
    12 hours ago
  •  ...Role: Senior NDR & Platform Observability Engineer / Architect Location: Remote Contract & FTE Both Senior NDR & Platform Observability Engineer will support the operational health, visibility, and performance of the enterprise Network Detection... 
    Contract work
    For contractors
    Remote work

    AceStack LLC

    United States
    2 days ago
  • $141.6k - $212.4k

     ...EngineeringGeneral Summary:Looking for a SRE Platform Lead resource to implement and support...  ...contributor role combining platform engineering + SRE + integration/event streaming expertise...  ..., Junit, PyTestDeep expertise in observability using Datadog (APM/logs/metrics,... 
    Work experience placement
    Work from home
    Weekend work
    Weekday work

    Nutanix

    San Diego, CA
    1 day ago
  • $153.84k - $246.15k

     ...differences. We believe that belonging leads to better outcomes and a stronger...  ...ones "I can succeed as a Lead AI Platform Engineer at Capital Group." As a Lead AI Platform...  ...protocols You ensure observability and responsible AI: Monitor model... 
    Temporary work
    Local area
    Flexible hours

    Capital Group

    Charlotte, NC
    1 day ago
  •  ...Helius is seeking a Staff Platform Engineer to design and implement observability systems from the ground up. In this role, you'll architect new pipelines for metrics, logs, and performance debugging, ensuring reliability and scaling. With 8+ years of programming expertise... 
    Remote work

    Helius

    New York, NY
    1 day ago
  • Zyphra in San Francisco is hiring a Platform Engineer responsible for designing and maintaining robust infrastructure. You will collaborate with teams to enhance system observability, manage cloud environments and ensure deployment safety. The ideal candidate has strong... 

    Zyphra

    San Francisco, CA
    4 days ago
  • $118.45k - $236.9k

    Koitecc Solutions is looking for a seasoned professional with over 10 years of experience in Software Engineering, specifically focusing on observability and system reliability. The successful candidate will have strong expertise in developing metrics, managing error budgets... 
    Full time

    Koitecc Solutions

    Richardson, TX
    12 hours ago
  • $140k - $180k

    A global trading firm in Chicago is seeking a Platform Engineer to join their Platform Infrastructure team. The role focuses on deploying, observing, and scaling systems critical to trading operations. Responsibilities include automating deployment patterns, driving CI... 

    P2P

    Chicago, IL
    3 days ago
  •  ...for architecture and development of a compute platform for HPC workloads. This role emphasizes improving observability and requires expertise in designing large-scale...  ...systems, contributing to high-quality solutions and leading design discussions. #J-18808-Ljbffr Quant... 

    Quant Blueprint LLC

    Boston, MA
    4 days ago
  • Career Techniques is looking for a candidate to design and scale observability platforms, focusing on telemetry from GPU clusters and large-scale systems. You will collaborate with skilled engineers to enhance metrics pipelines and logging systems, improving the reliability... 

    Career Techniques

    Wisconsin
    12 hours ago
  • Palantir is seeking a Senior Software Engineer for their New York office to own the observability platform. The successful candidate will work on log ingestion, processing...  ...coding skills in Go or Java, and experience in leading engineering teams. Benefits include flexible... 
    Work at office
    Flexible hours

    jobs.frontdoordefense.com - Jobboard

    New York, NY
    12 hours ago

Do you want to receive more vacancies?

Subscribe and receive similar vacancies to Lead Observability Platform Engineer. Be the first to apply!