Engineering Manager AI Observability
Netflix
At Netflix, our mission is to entertain the world. Together, we are writing the next episode - pushing the boundaries of storytelling, global fandom and making the unimaginable a reality. We are a dream team obsessed with the uncomfortable excitement of discovering what happens when you merge creativity, intuition and cutting-edge technology. Come be a part of what's next.
AI and ML powers innovation in all areas of the business, including helping members choose the right title for them through personalization, better understanding our audience and our content slate, creating high-quality subtitles, dubbings, images, trailers, and other assets, optimizing our payment processing, and much more. The Artificial Intelligence Platform (AIP) organization builds highly scalable, differentiated AI infrastructure to maximize the business impact of all AI/ML practitioners at Netflix, which is key to accelerating this innovation.
The Opportunity
The AI Observability team makes AI, ML, and Agentic systems transparent, reliable, and production-ready at scale. We build end-to-end observability for ML and GenAI workloads, capturing model inputs, features, predictions, outcomes, and behavior across online and batch systems. Our platform enables teams to monitor model performance, data quality, drift, latency, and failures, turning the ML system from a black box into an explainable, debuggable system. We provide developer-friendly libraries, dashboards, and alerts so teams can debug issues, respond to incidents, and ship AI-powered products with confidence.
We are looking for an experienced AI/ML infrastructure engineering leader to build and lead the next generation of our AI observability platform . You will lead this newly formed team to architect, design, develop, test, and launch a brand-new platform to enable ML practitioners across different business domains to effortlessly collect model inputs, features, and predictions for thousands of large-scale models, including Large Language Models (LLMs), computer vision, and foundation models.
We are a highly collaborative team. You will be highly cross-functional in partnering with other engineering, product management, machine learning, and data teams to take Netflix's AI/ML initiatives to the next level. To succeed in this role, you will need a strong background in AI infrastructure and a passion for building scalable, robust systems that enable and accelerate the application of AI Observability to large, complex ML models across diverse domains.
In this role, you will:
-
Partner with ML researchers, engineers, and platform teams to embed "observability-by-default" into new AI services, ensuring telemetry, monitoring, and evaluation are built into systems from day one.
-
Lead the end-to-end observability strategy for AI workloads, including LLMs, generative AI systems, and classical ML models; driving build vs. buy decisions, and scaling solutions across model training, online inference, and agent orchestration
-
Drive the evolution of LLM evaluation frameworks, covering prompt instrumentation, response quality measurement, grounding correctness, hallucination rates, and human/LLM‑as‑a‑judge scoring.
-
Define and execute a platform roadmap focused on incremental delivery, with clear success metrics, migration goals, and strong adoption across teams.
-
Communicate progress to stakeholders, customers, and senior leadership.
-
Hire, grow, and mentor a high-performing engineering team while fostering an inclusive and collaborative culture.
To succeed in this role, you will need:
- 10+ years of software engineering experience and 3+ years of management experience.
- Experience leading teams responsible for building high-traffic distributed systems and ML infrastructure
- Deep familiarity with AI and ML operations, including model evaluation, drift detection, and continuous monitoring at scale.
- Experience with AI observability and monitoring tools (e.g., Arize AI, Fiddler AI, Weights & Biases, Vertex AI Model Monitoring, SageMaker Model Monitor)
- Exposure to LLM or generative AI systems, including prompt/result logging, evaluation metrics, LLM-as-a-judge frameworks, and human-in-the-loop review
- Strong technical acumen and can act as a credible technical advisor to the team, set and enforce a high-quality bar for code and system design, and be a mentor for the team.
- Strong communication and collaboration skills, and the ability to build strong relationships with internal customers and external partners.
- A demonstrated ability to develop, drive, and execute a technical vision and roadmap.
- Experience managing a hybrid team with partners and team members distributed across (US) geographies & time zones.
To learn more about our AI Platform, you can review the relevant talks/blog posts on the Netflix AI Platform Research website .
Generally, our compensation structure consists solely of an annual salary; we do not have bonuses. You choose each year how much of your compensation you want in salary versus stock options. To determine your personal top of market compensation, we rely on market indicators and consider your specific job family, background, skills, and experience to determine your compensation in the market range. The range for this role is $523,000.00 - $920,000.00.
Netflix provides comprehensive benefits including Health Plans, Mental Health support, a 401(k) Retirement Plan with employer match, Stock Option Program, Disability Programs, Health Savings and Flexible Spending Accounts, Family-forming benefits, and Life and Serious Injury Benefits. We also offer paid leave of absence programs. Full-time hourly employees accrue 35 days annually for paid time off to be used for vacation, holidays, and sick paid time off. Full-time salaried employees are immediately entitled to flexible time off. See more details about our Benefits here .
Netflix is a unique culture and environment. Learn more here .
Inclusion is a Netflix value and we strive to host a meaningful interview experience for all candidates. If you want an accommodation/adjustment for a disability or any other reason during the hiring process, please send a request to your recruiting partner.
We are an equal-opportunity employer and celebrate diversity, recognizing that diversity builds stronger teams. We approach diversity and inclusion seriously and thoughtfully. We do not discriminate on the basis of race, religion, color, ancestry, national origin, caste, sex, sexual orientation, gender, gender identity or expression, age, disability, medical condition, pregnancy, genetic makeup, marital status, or military service.
$224k - $356.5k
...tapping into the unlimited potential of AI to define the next era of computing... ..., and scaling. As Technical Lead Manager, you will lead the engineering team within NVIDIA’s Dynamo... ...including operators, Helm charts, and GPU observability tooling (DCGM, dcgm-exporter,...SuggestedLocal areaWorldwide$272k - $431.25k
...Grace CPUs, and a fully optimized NVIDIA AI and HPC software stack. We’re looking... ...Proficiency in Out-of-Band and In-Band management architectures, device management protocols... ...degree in Computer Science, Electrical Engineering or related field (or equivalent experience...SuggestedShift work$197k - $291k
A leading technology company seeks a Software Engineering Manager II for YouTube to lead engineering teams in optimizing ML infrastructure and building recommendation systems. The ideal candidate has extensive software development experience, strong technical leadership...SuggestedFull time$224k - $356.5k
...We are looking for a highly motivated Engineering Manager, Hardware Infrastructure Build Systems... ...performance, reliability, reproducibility, and observability. Partnering with hardware, software... ...an existing vacancy. NVIDIA uses AI tools in its recruiting processes....SuggestedRemote work- ...Are The beginning of a new Data & AI decade that will reshape work and society... ...Power Ecosystem . The Work Data Engineering, Management & Governance Senior Manager and the... ...such as for a disability or religious observance, please call us toll free at 1 (877) 8...SuggestedWork experience placementLive inWork at officeLocal area
$291.5k - $369.1k
...Modelsteam at Splunk, where we advance the state of AI for highvolume, realtime, multimodal... ...operational excellence of Splunk and Cisco's global engineering capabilities. Our work spans networking, security, observability, and customer experience - designing and deploying...Full timeTemporary workLocal areaFlexible hours$206.4k - $384.68k
...venture at Adobe - an enterprise managed-service offering for custom multimedia generative AI. The offering includes deep-... ...We are hiring a Director, ML Engineering to own the engineering... ...services. Own analytics and observability across every model pipeline -...Temporary workLocal areaWorldwide- ...Engineering Manager At Coram AI, we're reimagining video security for the modern world. Our cloud-native platform uses computer vision and AI... ...pipelines Build strong engineering processes around observability, testing, and production stability Hire, mentor, and...Shift work
- ...Splunk AI Models Team Splunk, a Cisco company, is building a safer, more resilient digital world... ...excellence of Splunk and Cisco's global engineering capabilities. Our work spans networking, security, observability, and customer experience — designing and deploying...Flexible hours
$116.6k
...software solutions while fostering engineering excellence and cross-... ...and integrate analytics and AI-driven capabilities Ensure... ...Collaborate with product management, UX, and quality teams to deliver... ...Hub Familiarity with observability practices, including logging...Hourly payWork at officeWorldwideRelocation packageShift work3 days per week$136.5k - $253.5k
...increasing performance demands from AI. We are a dynamic, fast-... ..., and research-minded engineers on a mission to change that.... ...integration, automated testing, and observability systems to ensure production-... ...prompt engineering, context management, and alignment techniques....$147k - $237.5k
...Execution, Integrity, and Inclusion. We weave AI into the fabric of everything we do... ...way of life. We are looking for an Engineering Manager to lead the Explicit Proxy team, one... .... Ensure efficient deployment, observability, and runtime stability in production environments...Full timeWork at officeVisa sponsorshipWork visa$136.5k - $253.5k
...increasing performance demands from AI. We are a dynamic, fast-... ..., and research-minded engineers on a mission to change that.... ...integration, automated testing, and observability systems to ensure production-... ...prompt engineering, context management, and alignment techniques....$80 - $85 per hour
...transformation. As a leading product engineering firm based in Silicon Valley,... ...cutting-edge technologies in AI, ML, and data analytics. Our... ...Security & governance: RBAC, Managed Identity, Private Endpoints... ...Monitoring, logging, and observability Agile/Scrum delivery...Hourly pay$143k - $286k
...What you'll do... Principal Data Engineer – Agentic Data Platforms The Mission... ...the foundational data systems that enable AI agents, copilots, and large-scale analytics... ...Reliability, and Scale Define standards for observability, telemetry, lineage, governance, and AI...Full timeTemporary workPart time$191.4k - $281.4k
...foundation that brings Cisco's cloud-managed and controller-based products... .... We partner closely with engineering, security, legal, compliance,... ...onboarding, telemetry, observability, audit evidence, and operational... ...protect organizations in the AI era - and beyond. We've been...Full timeTemporary workLocal areaFlexible hours$380k - $430k
...is the leader in Agentic Process Automation (APA), transforming how work gets done with AI-powered automation. Its APA system, built on the industry's first Process Reasoning Engine (PRE) and specialized AI agents, combines process discovery, RPA, end-to-end orchestration...Work experience placementLocal areaRemote workWorldwideFlexible hours$146.7k
...are seeking a Principal Kubernetes DevOps Engineer who combines deep technical expertise... ...time media systems to web, team chat and AI to uncover architectural or operational... ...other cloud providers. Driving system observability, fault isolation, and resilience engineering...Casual workWork at officeRemote workWorldwide$198.3k - $342.8k
...Machine Learning Engineering Manager, Proactive - On-Device Modeling The AI represents a unique opportunity to elevate Apple's products and revolutionize the way hundreds of millions of people access information on their devices. As an Applied ML team, we're pushing...Work experience placementRelocation$2,000 per month
...Elastic, the Search AI Company, enables everyone to find the... ...solutions for search, security, and observability help organizations deliver on... ...for a Principal Analytics Engineer to lead the design and build... .../CCPA compliance and how to manage data privacy and consent...Local areaFlexible hours$262.5k - $393.8k
...Senior iOS Engineering Manager - AI Adoption The Health SW team is a diverse, talented, and passionate group of engineers working at the center of Apple's strategy to improve how people manage their health. We believe the latest AI development tools can enable more...Work at officeRelocation$240k - $280k
...candidates who are actively using AI tools to enhance productivity... ...at As Principal DevOps Engineer you are the most senior... ...partner closely with the DevOps Manager and engineering leadership on... ...datastore strategy, CI/CD, and observability - make the long-horizon calls...$228.1k - $393.8k
...Senior Machine Learning Engineering Manager – Ads Predictions At Apple, we focus deeply on our customers' experience. Apple Ads brings this... ...and take responsibility to drive its outcome ~ PhD or MS in AI/ML/Mathematics/CS or related field Preferred...Relocation$182k - $260k
...resilient, and secure. As an AI-forward enterprise , we are... ...looking for a Principal DevOps Engineer to join our team. This role... ...Infrastructure. You will architect and manage the global cloud... ...knowledge of Linux/BSD internals, observability stacks (Prometheus, InfluxDB)...Full timeWork at officeLocal areaRemote work$245.4k - $337.37k
...protect how people, data, and AI agents connect across email,... ...Senior Director, Security Engineering Location: Sunnyvale, CA... ...team of security engineers and managers, fostering a culture of technical... ...logging, telemetry, and observability of security controls across hybrid...Flexible hours$228.1k - $342.8k
...Engineering Manager, Data for Applied AI Models Join us at the forefront of redefining how people interact with their devices. On the Siri team, we're solving the challenge of building and shipping powerful LLMs that shape the way millions of Apple customers get things...Immediate startRelocation$286.2k - $326.7k
...Sr Director, AI Engineering Overview: At Capital One, we are creating responsible and... ...research scientists, technical program managers, and product managers to deliver AI-... ...evaluation, experimentation, governance, and observability, etc. Make high judgment build-vs-...Full timePart timeLocal area$147k - $237.5k
...Execution, Integrity, and Inclusion. We weave AI into the fabric of everything we do and... ...detection. Collaborate with engineers, and product teams to define test strategies... ...AWS, GCP, and Azure) Understanding of observability tools and monitoring systems Exposure...Full timeWork at officeVisa sponsorshipWork visa$123.24k - $200k
...Overview of Role As a Sr./Principal AI Engineer within TSMC's Artificial Intelligence... ...model serving, automated testing, and observability, to ensure our AI services meet... ...to both engineering teams and senior management audiences. Education: B.S....Work at office$206.4k - $384.68k
...The Opportunity We are looking for a Director of Engineering to guide the development of AI-native, production-ready machine learning products for... ...operational readiness. Collaborate directly with Product Management to build product vision, challenge assumptions, and...Temporary workLocal areaWorldwide
Do you want to receive more vacancies?
Subscribe and receive similar vacancies to Engineering Manager AI Observability. Be the first to apply!

