Observability Platform Engineer
Neuberger Berman
Observability Engineer
Neuberger's Technology team is seeking an Observability Engineer to lead and evolve our observability strategy across cloud and on-premises environments. You will serve as the primary owner and subject matter expert for our Datadog platform — building, scaling, and operating a comprehensive monitoring solution that continuously validates service health (24/7) across business-critical systems, including external websites and key infrastructure components (e.g., firewalls, OpenShift). You will design and implement end-to-end observability solutions spanning logs, metrics, traces, Service Level Objectives (SLOs), synthetic monitoring, and Real User Monitoring (RUM) to improve reliability, accelerate incident response, and deliver clear visibility into service performance.
This is an individual contributor role with strong Datadog engineering and scripting expectations — not a pure administrator role, though prior admin experience is beneficial. You will partner closely with application, SRE/DevOps, infrastructure, and security teams and serve as the internal champion and evangelist for Datadog adoption, standards, and best practices. The environment includes an active migration from OpenView to Datadog, with workflows integrating into ServiceNow for incident routing and escalation.
What You'll Do
- Serve as the primary Datadog platform owner — architecting, building, and maintaining scalable observability solutions across cloud and on-prem environments (Windows and Linux/Unix), with direct ownership of monitoring capabilities for key applications and services.
- Partner closely with application, DevOps, SRE/operations, infrastructure, and security teams to translate reliability goals into actionable Datadog monitoring strategies, dashboards, SLOs, and alerting frameworks.
- Lead and execute the migration from OpenView to Datadog, ensuring continuity of coverage and an improvement in monitoring fidelity across all migrated services and infrastructure.
- Develop and automate processes using Datadog's APIs, Terraform provider, and scripting (Python, PowerShell, Bash) to manage monitors, dashboards, alerts, and telemetry configuration at scale — ensuring consistency across Windows Server and Unix (Linux/Solaris) environments.
- Implement and optimize Datadog APM, distributed tracing, log management, infrastructure monitoring, and Network Performance Monitoring (NPM) to provide full-stack visibility.
- Build and evolve Datadog RUM and Synthetic Monitoring capabilities to track end-user experience and proactively validate availability of external-facing services and critical workflows.
- Define and operationalize SLOs and error budgets within Datadog; drive alert noise reduction through correlation, enrichment, threshold tuning, and monitor dependency mapping.
- Integrate Datadog with ServiceNow for incident/problem ticket routing and escalation; produce runbooks, post-incident reviews, and executive/operational dashboards to support reliability reporting.
- Champion OpenTelemetry (OTel) adoption and drive consistent logging, metrics, and tracing standards across the engineering organization using Datadog as the central observability platform.
- Onboard new applications and services into Datadog; provide guidance and enablement to engineering teams on instrumentation, agent deployment, and observability best practices.
- Collaborate on platform cost optimization, data governance, and scaling strategies to ensure Datadog remains performant and cost-effective as the environment grows.
Required Skills and Experience
- BS/BA in Computer Science, Information Systems, Engineering, or equivalent experience.
- 5+ years in Observability, APM, SRE, or Platform Engineering — with at least 2–3 years of hands-on, production-grade Datadog experience.
- Deep expertise across Datadog's core product suite: APM, Infrastructure Monitoring, Log Management, Synthetics, RUM, SLOs, Dashboards, Monitors, and Alerting.
- Proficiency in both Windows Server and Unix (Linux/Solaris) environments, including agent deployment, service instrumentation, and OS-level performance analysis.
- Strong scripting and automation skills (Python, PowerShell, Bash) with hands-on experience using the Datadog API/SDK and Terraform to manage observability configurations as code.
- Solid understanding of distributed tracing, metrics pipelines, logging standards, and SLO/error budget frameworks within Datadog.
- Experience integrating Datadog with cloud platforms (Azure and AWS) and centralizing cross-environment telemetry.
- Demonstrated ability to reduce alert noise and MTTR through Datadog monitor tuning, correlation, and enrichment strategies.
- Experience with ITSM integrations (e.g., Datadog → ServiceNow) and producing clear service maps, dependency views, and stakeholder-facing dashboards.
- Excellent communication and stakeholder management skills, with the ability to translate technical observability concepts for non-technical audiences.
- Strong documentation habits, attention to detail, and the ability to work both independently and collaboratively in a fast-paced environment.
Nice to Have
- Datadog certifications (e.g., Datadog Fundamentals, APM, or Log Management).
- Experience migrating from legacy monitoring platforms (e.g., OpenView, AppDynamics, Nagios) to Datadog.
- Familiarity with.NET development (C#), including Datadog instrumentation patterns for.NET applications.
- Experience in financial services or other regulated industries.
- Familiarity with ITSM integrations and CMDB alignment for incident, problem, and change management workflows.
- Experience with CI/CD pipeline integration, synthetic testing strategies, and Datadog-based performance/capacity analysis for latency-sensitive systems.
- Knowledge of network monitoring concepts and Datadog NPM/NDM capabilities.
Hybrid Notice
This is a hybrid position. Currently, the hybrid work schedule for this position is 2–3 days in the office. Please understand that the hybrid schedule may be modified or eliminated at any time at Neuberger's discretion.
- ...Infrastructure-as-Code Docker containers (JavaScript, Python) Grafana or observability tools SonarQube (code quality/security) JFrog Artifactory AI-assisted tools (e.g., GitHub Copilot) Internal Developer Platforms Qualifications Experience supporting enterprise DevOps or...Suggested
- A technology company based in the United States is seeking a Sr. Platform Engineer to manage AWS, GCP, and cloud infrastructure. In this role, you will plan monitoring and observability mechanisms, develop tooling in Rust, and ensure operations meet reliability standards...SuggestedRemote jobFlexible hours
- Palantir is seeking a Senior Software Engineer for their New York office to own the observability platform. The successful candidate will work on log ingestion, processing, and monitoring solutions, while collaborating with leadership to define technical strategies. Ideal...SuggestedWork at officeFlexible hours
- Helius is seeking a Staff Platform Engineer to design and implement observability systems from the ground up. In this role, you'll architect new pipelines for metrics, logs, and performance debugging, ensuring reliability and scaling. With 8+ years of programming expertise...SuggestedRemote job
- DroneUp, LLC is hiring an SRE - Platform Engineer in the United States, focusing on the reliability and performance of their IT infrastructure while mentoring teams. Responsibilities include managing SLOs and incident response while working with cloud technologies such...Suggested
$230k - $270k
...Principal/Lead Level Software Engineer The LangSmith team owns and builds LangChain's core platform for observability, evaluation, and production reliability of AI systems. From tracing and annotation to run rules, evaluations, and beyond, they own this end-to-end....Work at officeFlexible hours- A leading identity solutions provider in the United States is seeking a Senior Platform Engineer to build and evolve their core platform. You will ensure systems are reliable and scalable while working closely with product and data teams. The ideal candidate should have...Remote work
$145k - $180k
...Fullstack Engineer We're looking for a Fullstack Engineer to work on LangSmith, our commercial AI observability and evals platform product. In this role, you'll have the opportunity to build new features and functionality for the platform while working close enterprise...Work at officeFlexible hours$175k - $240k
...Senior Fullstack Engineer In person 5 days/week in San Francisco We're looking for a Senior Fullstack Engineer for our commercial product LangSmith, an observability and evals platform. In this role, you'll have the opportunity to shape the technical direction of...Work at officeFlexible hours$175k - $240k
...Frontend Engineer At LangChain, our mission is to make intelligent agents ubiquitous... ...source tools and have grown to also offer a platform for building, evaluating, deploying,... ..., our platform includes LangSmith (Observability, Evaluation, Deployment, Fleet, and Sandboxes...Work at officeFlexible hours$155k - $195k
...agent orchestration. Our commercial agent platform, consisting of LangSmith and LangGraph... ...Founded in 2023, LangChain powers top engineering teams at companies like Replit, Lovable... ...platform product for LLM application observability, testing, and debugging. You will: Develop...$150k - $180k
A leading health technology company is seeking a highly skilled Platform Developer to join their platform team. The role focuses on building and maintaining a robust infrastructure to enhance their healthcare solutions. Responsibilities include developing automation pipelines...Flexible hours- Quantiphi, Inc. is seeking a highly experienced DevOps/Observability Engineer to lead the implementation of a next-generation unified observability platform remotely. The ideal candidate will have extensive experience with OpenTelemetry and Kubernetes observability, and...Remote work
- Senior Manager, Federal Platform Operations (IL5/IL6 & Classified Deployments) Remote,... ...infrastructures. You will partner closely with Engineering, Security, Product, QA, Customer... ...or secrets management Experience with observability practices (metrics, logs, tracing) and...Temporary workFor contractorsFor subcontractorRemote work
- ...customers you’ll ever meet. Software Engineer Comprehensive health and wellness benefits... ...this by providing a modern operating platform and electronic health record (EHR) that... ..., resource management, networking, and observability CI/CD pipelines — owning and optimizing...Full timeRemote workFlexible hours
- jobr.pro is seeking a Senior Site Reliability Engineer in New York, NY, to enhance platform reliability and engineering excellence. You will be instrumental in implementing observability, security, and CI/CD practices. This role involves coaching teams and optimizing workflows...
- Freelanceshop is looking for a remote SRE Observability Engineer (Datadog Specialist) to enhance our cloud-based platforms. This critical role involves designing monitoring systems to ensure reliability and performance. You will collaborate with various teams to provide...Remote job
- ...years of experience, MS with 6+ YoE, or PhD with 3+ YoE in platform engineering, DevOps, SRE, or closely related infrastructure engineering... ...design systems that are not only functional, but supportable, observable, secure, and audit‑ready. This is an opportunity to own...Contract work
- ...power of airspace technology, analytics platforms, and drone services to transform... ...DroneUp is seeking an SRE - Platform Engineer who will focus on ensuring the reliability... ...emphasizing uptime, incident response, and observability. The ideal candidate will drive SRE best...Contract workRemote work
$70k - $99k
LMI Government Consulting is seeking a highly motivated DevOps Engineer to join their health project team. This role focuses on observability, reliability, and automation within healthcare technology operations. The successful candidate will work with cloud infrastructure...- ...community and make a real impact. Job Overview Engineer the future of global finance. At Citi,... ...member to support our AI and DevOps Platform Support team in North America. This... ...cost-saving initiatives. Monitoring and Observability: Helps maintain platform health by...Full timeTemporary workCasual workWork from homeFlexible hours
- ...lead efforts to build a cohesive internal developer platform and paved road that helps engineers to ship quickly, safely, & reliably. This role sits... ...development. From CI/CD and infrastructure to observability, security, and service design. What you'll do...Local area
$180k - $250k
...Platform Engineer GovSignals New York, NY • Full-time • Hybrid About GovSignals We are shaping the future of government contracting... ...data modeling at scale and overtime ~ Experience with observability stacks (e.g. OpenTelemetry) ~ Proven ability to move...Full timeFor contractorsLocal area$170k - $235k
...Staffing has been engaged to conduct a search for a Senior Platform Engineer for a rapidly growing, venture-backed technology... ...deployment speed, consistency, and reliability Implement observability practices including monitoring, logging, and alerting to maintain...Temporary workInterim roleImmediate start$160k - $250k
...ecosystems. The role We're looking for a Senior Platform Software Engineer to build the systems, tooling, and internal platform that... ...recovery systems Create deployment workflows that are fast, observable, and boring in production Cloud Infrastructure &...Local area$100.8k - $170k
...America, and the premier programmer and platform for subscription and digital... ...you'll make an impact: The Platform Engineering team is seeking a Senior Software Engineer... ...Familiarity with using Datadog or other observability platforms. ~ AWS professional or associate...Temporary workLocal area$155k - $195k
...Senior Platform Engineer, Connectivity Denver, CO (Hybrid), San Antonio, TX (Hybrid), Brooklyn, NY or Remote (US Based) Simplesense... ...more software languages such as Node.js, Python, or Go. Observability: Strong experience with observability stacks (e.g., Splunk...Temporary workFor contractorsLocal areaRemote work$160k - $300k
...Hebbia The AI platform for investors and bankers that generates alpha and drives upside... ...leadership. The Role Platform engineering at Hebbia is about excellent, scalable... ...Prefect. ~ Proven experience enabling observability patterns. ~ Ability to analyze...Work experience placement- ...Platform Engineer We're looking for a Platform Engineer to own the foundational primitives and surfaces that power Antimetal's product... ...depends on - our internal admin tooling, our deploy and observability pipelines, our MCP gateway, and the integration substrate that...Work at officeFlexible hoursNight shift
$113.9k - $189.9k
...Summary of Business Unit/Function: The One Policy Engine is a unified policy platform for multi-cloud environments that keeps InfrastructureasCode... ...help streamline deployment processes, improve system observability, and ensure high availability across cloud platforms....Part timeInternship
Do you want to receive more vacancies?
Subscribe and receive similar vacancies to Observability Platform Engineer. Be the first to apply!
- client platform engineer New York, NY
- platform engineer New York, NY
- senior platform engineer New York, NY
- platform engineering manager New York, NY
- data platform engineer New York, NY
- platform developer New York, NY
- digital platform specialist New York, NY
- director of digital platform New York, NY
- platform product manager New York, NY
- platform manager New York, NY

