Sign up to access all features of our service.
  • Job search
  • Favorites
  • Create a CV
    New
  • Salaries
  • Subscriptions

Observability Platform Engineer

Neuberger Berman

Observability Engineer

Neuberger's Technology team is seeking an Observability Engineer to lead and evolve our observability strategy across cloud and on-premises environments. You will serve as the primary owner and subject matter expert for our Datadog platform — building, scaling, and operating a comprehensive monitoring solution that continuously validates service health (24/7) across business-critical systems, including external websites and key infrastructure components (e.g., firewalls, OpenShift). You will design and implement end-to-end observability solutions spanning logs, metrics, traces, Service Level Objectives (SLOs), synthetic monitoring, and Real User Monitoring (RUM) to improve reliability, accelerate incident response, and deliver clear visibility into service performance.

This is an individual contributor role with strong Datadog engineering and scripting expectations — not a pure administrator role, though prior admin experience is beneficial. You will partner closely with application, SRE/DevOps, infrastructure, and security teams and serve as the internal champion and evangelist for Datadog adoption, standards, and best practices. The environment includes an active migration from OpenView to Datadog, with workflows integrating into ServiceNow for incident routing and escalation.

What You'll Do
  • Serve as the primary Datadog platform owner — architecting, building, and maintaining scalable observability solutions across cloud and on-prem environments (Windows and Linux/Unix), with direct ownership of monitoring capabilities for key applications and services.
  • Partner closely with application, DevOps, SRE/operations, infrastructure, and security teams to translate reliability goals into actionable Datadog monitoring strategies, dashboards, SLOs, and alerting frameworks.
  • Lead and execute the migration from OpenView to Datadog, ensuring continuity of coverage and an improvement in monitoring fidelity across all migrated services and infrastructure.
  • Develop and automate processes using Datadog's APIs, Terraform provider, and scripting (Python, PowerShell, Bash) to manage monitors, dashboards, alerts, and telemetry configuration at scale — ensuring consistency across Windows Server and Unix (Linux/Solaris) environments.
  • Implement and optimize Datadog APM, distributed tracing, log management, infrastructure monitoring, and Network Performance Monitoring (NPM) to provide full-stack visibility.
  • Build and evolve Datadog RUM and Synthetic Monitoring capabilities to track end-user experience and proactively validate availability of external-facing services and critical workflows.
  • Define and operationalize SLOs and error budgets within Datadog; drive alert noise reduction through correlation, enrichment, threshold tuning, and monitor dependency mapping.
  • Integrate Datadog with ServiceNow for incident/problem ticket routing and escalation; produce runbooks, post-incident reviews, and executive/operational dashboards to support reliability reporting.
  • Champion OpenTelemetry (OTel) adoption and drive consistent logging, metrics, and tracing standards across the engineering organization using Datadog as the central observability platform.
  • Onboard new applications and services into Datadog; provide guidance and enablement to engineering teams on instrumentation, agent deployment, and observability best practices.
  • Collaborate on platform cost optimization, data governance, and scaling strategies to ensure Datadog remains performant and cost-effective as the environment grows.
Required Skills and Experience
  • BS/BA in Computer Science, Information Systems, Engineering, or equivalent experience.
  • 5+ years in Observability, APM, SRE, or Platform Engineering — with at least 2–3 years of hands-on, production-grade Datadog experience.
  • Deep expertise across Datadog's core product suite: APM, Infrastructure Monitoring, Log Management, Synthetics, RUM, SLOs, Dashboards, Monitors, and Alerting.
  • Proficiency in both Windows Server and Unix (Linux/Solaris) environments, including agent deployment, service instrumentation, and OS-level performance analysis.
  • Strong scripting and automation skills (Python, PowerShell, Bash) with hands-on experience using the Datadog API/SDK and Terraform to manage observability configurations as code.
  • Solid understanding of distributed tracing, metrics pipelines, logging standards, and SLO/error budget frameworks within Datadog.
  • Experience integrating Datadog with cloud platforms (Azure and AWS) and centralizing cross-environment telemetry.
  • Demonstrated ability to reduce alert noise and MTTR through Datadog monitor tuning, correlation, and enrichment strategies.
  • Experience with ITSM integrations (e.g., Datadog → ServiceNow) and producing clear service maps, dependency views, and stakeholder-facing dashboards.
  • Excellent communication and stakeholder management skills, with the ability to translate technical observability concepts for non-technical audiences.
  • Strong documentation habits, attention to detail, and the ability to work both independently and collaboratively in a fast-paced environment.
Nice to Have
  • Datadog certifications (e.g., Datadog Fundamentals, APM, or Log Management).
  • Experience migrating from legacy monitoring platforms (e.g., OpenView, AppDynamics, Nagios) to Datadog.
  • Familiarity with.NET development (C#), including Datadog instrumentation patterns for.NET applications.
  • Experience in financial services or other regulated industries.
  • Familiarity with ITSM integrations and CMDB alignment for incident, problem, and change management workflows.
  • Experience with CI/CD pipeline integration, synthetic testing strategies, and Datadog-based performance/capacity analysis for latency-sensitive systems.
  • Knowledge of network monitoring concepts and Datadog NPM/NDM capabilities.
Hybrid Notice

This is a hybrid position. Currently, the hybrid work schedule for this position is 2–3 days in the office. Please understand that the hybrid schedule may be modified or eliminated at any time at Neuberger's discretion.

Vacancy posted 2 days ago
Similar jobs that could be interesting for youBased on the Observability Platform Engineer in New York, NY vacancy
  •  ...Infrastructure-as-Code Docker containers (JavaScript, Python) Grafana or observability tools SonarQube (code quality/security) JFrog Artifactory AI-assisted tools (e.g., GitHub Copilot) Internal Developer Platforms Qualifications Experience supporting enterprise DevOps or... 
    Suggested

    Apex Systems

    New York, NY
    4 days ago
  • A technology company based in the United States is seeking a Sr. Platform Engineer to manage AWS, GCP, and cloud infrastructure. In this role, you will plan monitoring and observability mechanisms, develop tooling in Rust, and ensure operations meet reliability standards... 
    Suggested
    Remote job
    Flexible hours

    3Box Labs

    New York, NY
    4 days ago
  • Palantir is seeking a Senior Software Engineer for their New York office to own the observability platform. The successful candidate will work on log ingestion, processing, and monitoring solutions, while collaborating with leadership to define technical strategies. Ideal... 
    Suggested
    Work at office
    Flexible hours

    jobs.frontdoordefense.com - Jobboard

    New York, NY
    2 days ago
  • Helius is seeking a Staff Platform Engineer to design and implement observability systems from the ground up. In this role, you'll architect new pipelines for metrics, logs, and performance debugging, ensuring reliability and scaling. With 8+ years of programming expertise... 
    Suggested
    Remote job

    Helius

    New York, NY
    4 days ago
  • DroneUp, LLC is hiring an SRE - Platform Engineer in the United States, focusing on the reliability and performance of their IT infrastructure while mentoring teams. Responsibilities include managing SLOs and incident response while working with cloud technologies such... 
    Suggested

    DroneUp, LLC

    New York, NY
    4 days ago
  • $230k - $270k

     ...Principal/Lead Level Software Engineer The LangSmith team owns and builds LangChain's core platform for observability, evaluation, and production reliability of AI systems. From tracing and annotation to run rules, evaluations, and beyond, they own this end-to-end.... 
    Work at office
    Flexible hours

    LangChain

    New York, NY
    21 hours ago
  • A leading identity solutions provider in the United States is seeking a Senior Platform Engineer to build and evolve their core platform. You will ensure systems are reliable and scalable while working closely with product and data teams. The ideal candidate should have... 
    Remote work

    CloudDevs

    New York, NY
    4 days ago
  • $145k - $180k

     ...Fullstack Engineer We're looking for a Fullstack Engineer to work on LangSmith, our commercial AI observability and evals platform product. In this role, you'll have the opportunity to build new features and functionality for the platform while working close enterprise... 
    Work at office
    Flexible hours

    LangChain

    New York, NY
    2 days ago
  • $175k - $240k

     ...Senior Fullstack Engineer In person 5 days/week in San Francisco We're looking for a Senior Fullstack Engineer for our commercial product LangSmith, an observability and evals platform. In this role, you'll have the opportunity to shape the technical direction of... 
    Work at office
    Flexible hours

    LangChain

    New York, NY
    2 days ago
  • $175k - $240k

     ...Frontend Engineer At LangChain, our mission is to make intelligent agents ubiquitous...  ...source tools and have grown to also offer a platform for building, evaluating, deploying,...  ..., our platform includes LangSmith (Observability, Evaluation, Deployment, Fleet, and Sandboxes... 
    Work at office
    Flexible hours

    Langchain

    New York, NY
    2 days ago
  • $155k - $195k

     ...agent orchestration. Our commercial agent platform, consisting of LangSmith and LangGraph...  ...Founded in 2023, LangChain powers top engineering teams at companies like Replit, Lovable...  ...platform product for LLM application observability, testing, and debugging. You will: Develop... 

    Langchain

    New York, NY
    4 days ago
  • $150k - $180k

    A leading health technology company is seeking a highly skilled Platform Developer to join their platform team. The role focuses on building and maintaining a robust infrastructure to enhance their healthcare solutions. Responsibilities include developing automation pipelines... 
    Flexible hours

    Capital Rx

    New York, NY
    3 days ago
  • Quantiphi, Inc. is seeking a highly experienced DevOps/Observability Engineer to lead the implementation of a next-generation unified observability platform remotely. The ideal candidate will have extensive experience with OpenTelemetry and Kubernetes observability, and... 
    Remote work

    Quantiphi, Inc.

    New York, NY
    1 day ago
  • Senior Manager, Federal Platform Operations (IL5/IL6 & Classified Deployments) Remote,...  ...infrastructures. You will partner closely with Engineering, Security, Product, QA, Customer...  ...or secrets management Experience with observability practices (metrics, logs, tracing) and... 
    Temporary work
    For contractors
    For subcontractor
    Remote work

    Keeper Security

    New York, NY
    4 days ago
  •  ...customers you’ll ever meet. Software Engineer Comprehensive health and wellness benefits...  ...this by providing a modern operating platform and electronic health record (EHR) that...  ..., resource management, networking, and observability CI/CD pipelines — owning and optimizing... 
    Full time
    Remote work
    Flexible hours

    Augusthealth

    New York, NY
    4 days ago
  • jobr.pro is seeking a Senior Site Reliability Engineer in New York, NY, to enhance platform reliability and engineering excellence. You will be instrumental in implementing observability, security, and CI/CD practices. This role involves coaching teams and optimizing workflows... 

    jobr.pro

    New York, NY
    4 days ago
  • Freelanceshop is looking for a remote SRE Observability Engineer (Datadog Specialist) to enhance our cloud-based platforms. This critical role involves designing monitoring systems to ensure reliability and performance. You will collaborate with various teams to provide... 
    Remote job

    Freelanceshop

    New York, NY
    2 days ago
  •  ...years of experience, MS with 6+ YoE, or PhD with 3+ YoE in platform engineering, DevOps, SRE, or closely related infrastructure engineering...  ...design systems that are not only functional, but supportable, observable, secure, and audit‑ready. This is an opportunity to own... 
    Contract work

    Peraton

    New York, NY
    21 hours ago
  •  ...power of airspace technology, analytics platforms, and drone services to transform...  ...DroneUp is seeking an SRE - Platform Engineer who will focus on ensuring the reliability...  ...emphasizing uptime, incident response, and observability. The ideal candidate will drive SRE best... 
    Contract work
    Remote work

    DroneUp, LLC

    New York, NY
    4 days ago
  • $70k - $99k

    LMI Government Consulting is seeking a highly motivated DevOps Engineer to join their health project team. This role focuses on observability, reliability, and automation within healthcare technology operations. The successful candidate will work with cloud infrastructure... 

    LMI Government Consulting

    New York, NY
    4 days ago
  •  ...community and make a real impact. Job Overview Engineer the future of global finance. At Citi,...  ...member to support our AI and DevOps Platform Support team in North America. This...  ...cost-saving initiatives. Monitoring and Observability: Helps maintain platform health by... 
    Full time
    Temporary work
    Casual work
    Work from home
    Flexible hours

    Citi

    New York, NY
    4 days ago
  •  ...lead efforts to build a cohesive internal developer platform and paved road that helps engineers to ship quickly, safely, & reliably. This role sits...  ...development. From CI/CD and infrastructure to observability, security, and service design. What you'll do... 
    Local area

    Harnham

    New York, NY
    1 day ago
  • $180k - $250k

     ...Platform Engineer GovSignals New York, NY • Full-time • Hybrid About GovSignals We are shaping the future of government contracting...  ...data modeling at scale and overtime ~ Experience with observability stacks (e.g. OpenTelemetry) ~ Proven ability to move... 
    Full time
    For contractors
    Local area

    GovSignals

    New York, NY
    3 days ago
  • $170k - $235k

     ...Staffing has been engaged to conduct a search for a Senior Platform Engineer for a rapidly growing, venture-backed technology...  ...deployment speed, consistency, and reliability Implement observability practices including monitoring, logging, and alerting to maintain... 
    Temporary work
    Interim role
    Immediate start

    Scion Staffing

    New York, NY
    4 days ago
  • $160k - $250k

     ...ecosystems. The role We're looking for a Senior Platform Software Engineer to build the systems, tooling, and internal platform that...  ...recovery systems Create deployment workflows that are fast, observable, and boring in production Cloud Infrastructure &... 
    Local area

    Standard Template Labs

    New York, NY
    3 days ago
  • $100.8k - $170k

     ...America, and the premier programmer and platform for subscription and digital...  ...you'll make an impact: The Platform Engineering team is seeking a Senior Software Engineer...  ...Familiarity with using Datadog or other observability platforms. ~ AWS professional or associate... 
    Temporary work
    Local area

    Sirius XM Radio Inc

    West New York, NJ
    4 days ago
  • $155k - $195k

     ...Senior Platform Engineer, Connectivity Denver, CO (Hybrid), San Antonio, TX (Hybrid), Brooklyn, NY or Remote (US Based) Simplesense...  ...more software languages such as Node.js, Python, or Go. Observability: Strong experience with observability stacks (e.g., Splunk... 
    Temporary work
    For contractors
    Local area
    Remote work

    SimpleSense

    Brooklyn, NY
    1 day ago
  • $160k - $300k

     ...Hebbia The AI platform for investors and bankers that generates alpha and drives upside...  ...leadership. The Role Platform engineering at Hebbia is about excellent, scalable...  ...Prefect. ~ Proven experience enabling observability patterns. ~ Ability to analyze... 
    Work experience placement

    Hebbia

    New York, NY
    1 day ago
  •  ...Platform Engineer We're looking for a Platform Engineer to own the foundational primitives and surfaces that power Antimetal's product...  ...depends on - our internal admin tooling, our deploy and observability pipelines, our MCP gateway, and the integration substrate that... 
    Work at office
    Flexible hours
    Night shift

    Antimetal

    New York, NY
    2 days ago
  • $113.9k - $189.9k

     ...Summary of Business Unit/Function: The One Policy Engine is a unified policy platform for multi-cloud environments that keeps InfrastructureasCode...  ...help streamline deployment processes, improve system observability, and ensure high availability across cloud platforms.... 
    Part time
    Internship

    LSEG (London Stock Exchange Group)

    New York, NY
    21 hours ago

Do you want to receive more vacancies?

Subscribe and receive similar vacancies to Observability Platform Engineer. Be the first to apply!