ML Engineer - Evaluation Analysis, Metric and Data Strategy
Apple Oakbrook
Weekly Hours: 40
Role Number: 200657984-3337
Summary
The Productivity and Machine Learning Evaluation team ensures the quality of AI-powered features across a suite of productivity and creative applications; including Creator Studio, used by hundreds of millions of people. This team serves as the primary evaluation function, and its analysis directly informs decisions about model development, feature launches, and product direction. This role is the analytical core of the team; responsible for making sense of evaluation signals and real-world user behavior. The work involves designing feature-level quality metrics, collaborating with partner teams on data collection strategies, and translating evaluation data into concise, actionable insights that drive decisions. This is an opportunity to define how AI feature quality is measured and to directly shape what gets shipped. As AI features evolve into multi-turn, agentic experiences, this role will define what “quality” means when the unit of evaluation is a conversation, not a single response.
Description
Day-to-day work involves analyzing evaluation results, identifying trends, regressions, and segment-level patterns across multiple AI features. This includes collaborating with partner teams on data collection strategies, ensuring evaluation data is representative of real-world usage, and designing the metrics framework that leadership uses to make decisions on AI features. Typical deliverables include: feature-level quality metrics and dashboards, evaluation analysis reports, data collection requirements, dataset representativeness audits, multi-turn evaluation frameworks and session-level scoring rubrics, and concise metric summaries for decision-makers.
Minimum Qualifications
Bachelor’s degree in Statistics, Data Science, Applied Mathematics, Computer Science, or a related quantitative field
5+ years of experience in applied science, data science, or evaluation research, with a focus on defining and operationalizing quality metrics
Experience with statistical analysis methods including significance testing, sampling design, effect size estimation, and experimental design
Experience working with production user data, understanding its biases and limitations compared to controlled evaluation data, including familiarity with sequential interaction data where context and turn order affect quality assessment
Ability to design evaluation approaches where the unit of analysis is a session or conversation rather than a single model output
Track record of independently designing metrics frameworks and driving data-informed decisions across cross-functional teams
Proficiency in Python (pandas, scipy, scikit-learn) or R for data analysis and visualization
Preferred Qualifications
Experience designing evaluation or quality metrics for AI-powered or ML-driven features in consumer-facing products
Familiarity with productivity software or creative applications, with an ability to distinguish between technically correct and genuinely useful AI outputs
Experience partnering with engineering or data teams to define data collection requirements and schemas
Track record of translating complex analytical findings into concise recommendations for non-technical decision-makers
Experience evaluating tool-use accuracy, retrieval quality, or function-calling reliability within AI systems
Experience with evaluation methodology including inter-annotator agreement, evaluation bias detection, and dataset representativeness auditing
Familiarity with agentic orchestration frameworks (LangChain, LangGraph, CrewAI, AutoGen) and emerging agent interoperability protocols (A2A, MCP), with an understanding of how architectural choices in agent design affect evaluability
Understanding of ML model development processes, with the ability to specify what evaluation signals are useful for model improvement
Experience managing evaluation across multiple features or product areas simultaneously, with systematic rather than ad-hoc approaches
Graduate degree in a relevant quantitative field
$139.5k - $258.1k
Apple Inc. in Seattle, Washington, seeks an ML Engineer for the Productivity and Machine Learning Evaluation team. This role involves defining quality metrics and analyzing evaluation results to inform decisions on AI features across productivity applications. Candidates...Data- ...Productivity and Machine Learning Evaluation team ensures the... ...extending ML evaluation systems, including... ...approaches where the unit of analysis is a conversation or... ...understanding of how technical metrics connect to user-... ...to scale evaluation data generation and analysis...DataShift work
$139.5k - $258.1k
...Evaluation & Insights Machine Learning Engineer Imagine what you could do... ...review and analyze data, and evaluate... ...model behavior analysis, and qualitative... ...quality metrics (e.g., helpfulness... ...Generation (RAG) strategies, and model fine... ...Apply advanced ML techniques (e.g...DataRelocation$139.5k - $258.1k
ML Engineer - Automated Evaluation and Adversarial Design Seattle, Washington, United States... ...where the unit of analysis is a conversation or session... ...understanding of how technical metrics connect to user-perceived... ...to scale evaluation data generation and analysis Experience...DataRelocationShift work$171.6k - $302.2k
...Senior/Staff Applied ML Engineer – AI/ML Evaluation & Simulation We're building the... ...simulation and behavior analysis. This role sits at the intersection... ..., and analyze evaluation data Develop scalable... ...and operationalize success metrics aligned with product and research...DataRelocation$60 - $70 per hour
...Machine Learning Engineer to join a high-... ...advancing LLM evaluation, NLP, and AI-... ...optimizing prompt strategies, and building... ...engineering, and data-driven model... ...Create and maintain metrics, KPIs, and... ...Conduct error analysis, root-cause investigations... ...experience in ML engineering,...DataContract workTemporary workRemote work3 days per week- ...we are a team of engineers and technologists... ...Software Engineer Evaluation, you will design and... ...machine learning and data engineering teams... ...dataset coverage analysis to understand... ...collaborate with ML engineers to improve... ...performance and evaluation metrics Collaborate...Data
- ...capabilities? The Data and Machine... ...Machine Learning Engineer to explore new... ...challenge existing metrics and protocols,... ...for real-world ML challenges. As... ...post-training evaluation and fine-tuning... ...fine-tuning strategies for tasks like... ...optimization, causality analysis, natural...DataWorldwide
$181.1k - $318.4k
...AIML - Senior ML Engineer, Responsible AI and Safety... ...developing mitigation strategies, and driving continuous... ...directly influence how we evaluate, align, and monitor... ...Proficiency in Python and data science libraries (e.g... ...strong skills in data analysis, visualization, and...DataRelocation$120.3k - $210.1k
...next generation of AI evaluation systems — and we’re looking... ...motivated early‑career engineer who’s excited to work at the intersection of ML, software, and product.... ...systems, support data tooling, and contribute... ...infrastructure Learn how to define metrics that connect model...DataInternshipRelocation$229k - $343k
...Spectacles. Snap Engineering teams build fun... ...product managers, data scientists, and... ...define success metrics, experimentation strategy, and long-term ranking... ...robust offline evaluation, online... ...engineers working on ML ranking systems... ...debugging, and tradeoff analysis Proven ability...DataFull timeWork experience placementLive inWork at officeLocal area- ...Principal AI Agent / ML Software Engineer The Senior Principal... ...durable technical strategy, lead multi-team execution... ..., memory, retrieval, evaluation, guardrails, and... ...service boundaries, APIs, data models, state... ...automation, incident analysis, documentation, and AI...Data
- ...This team also focuses on ML-driven forecasting,... ...As a Sr. ML Optimization Engineer, you will work at the intersection... ..., infrastructure strategy, applied analytics,... ...role involves in-depth analysis of infrastructure usage patterns, and data-driven capacity planning...Data
- ...inference. With roots in ML, computer vision, and energy... ...hands-on Machine Learning Engineer to drive the data & evaluation lifecycle for our... ...performing in-depth failure analysis on production models, and... ...video datasets, designing metrics to understand user behavior...Data
- ...Machine Learning Engineer Expedia Technology... ...powered by data and machine learning... ...batch and real-time ML systems that power... ...alerting, and root-cause analysis ~ Experience... ..., model evaluation, bias/variance tradeoffs... ...offline vs online metrics Hands-on experience...Data
$181.1k - $318.4k
.... Machine Learning Engineer - Answers, Knowledge... ...for an experienced ML engineer with hands... ...from opportunity analysis, exploration, and prototyping to data collection, feature... ...engineering, training, evaluation, and deployment in... ..., and evaluation metrics. You have proven...DataLocal areaRelocation$139.5k - $258.1k
...Data Engineer - AIML Evaluation Apple is where individual imaginations gather together, committing to... ...visualizations to surface experimentation metrics and operational health. Engage... ...testing methodologies, statistical analysis, or experimentation platforms....DataRelocation$148.2k - $300.96k
...Machine Learning Engineer, E-commerce Governance Algorithms... ..., images, behavioral data) to detect false... ...logistics performance metrics (e.g., delivery delays... ...quality qualification strategies to cut missing recalls... ...networks, time series analysis, or LLM. Domain Expertise...DataTemporary workLocal area$139.5k - $258.1k
...AIML - Machine Learning Engineer, MIND As a Machine Learning... ...to be part of an ML innovation organization... ...up model training, build data pipelines, and tuning to... ...models. Able to define metrics, evaluate ML models, and perform error analysis. Familiar with recent...DataTemporary workRelocation$69.76 - $111.61 per hour
...Machine Learning Engineer At Brooks, we believe... ...to harness our data assets across a... ...Machine Learning (ML) Engineer, you will... ...exploratory data analysis and drive end-to-end... ...needs. Lead Al strategy and implementation... ...engineering and evaluate models for accuracy...DataFull timeTemporary workLocal area- ...Machine Learning Engineer to join our team to... ...logs and synthetic data, your work will... ...the intersection of ML and data science.... ...You'll incorporate metrics and information on... ...learning models, evaluation, and optimization.... ...scripting and data analysis languages like SQL...DataTemporary workRelocation package
$242k - $333k
...Machine Learning Engineer to join our team to... ...logs and synthetic data, your work will... ...the intersection of ML and data science.... ...You'll incorporate metrics and information on... ...learning models, evaluation, and optimization.... ...scripting and data analysis languages like SQL...DataTemporary workRemote workRelocation package$139.5k - $258.1k
...Machine Learning Engineer, Information Security Join... ...cybersecurity through data-driven intelligence.... ...customers. The Security ML Engineer will bring... ..., model development, evaluation metrics design, deployment,... ...scale data processing and analysis using tools such as...DataLocal areaRelocation$171.6k - $302.2k
...This team also focuses on ML-driven forecasting,... ...As a Sr. ML Optimization Engineer, you will work at the intersection... ..., infrastructure strategy, applied analytics,... ...role involves in-depth analysis of infrastructure usage patterns, and data‑driven capacity planning...DataRelocation$129.3k
...seeking a Senior Firmware Engineer to join our Power... ...management on ML Acceleration Chips. In... ...algorithms, optimization strategies, and real-time decision... ..., with collected data optionally post-processed... ...data pipelines for metric collection, analysis, and visualization of...DataLocal area$164k - $313.3k
...Senior Machine Learning (ML) Systems & Efficiency Engineer to join our R&D team... ...advanced serving strategies including batching,... ...including data, tensor, pipeline, expert... ...Conduct deep performance analysis using tools such as... ...and track efficiency metrics such as cost per...DataTemporary workLocal areaWorldwide- ...Sesame Engineer Position Sesame believes in a future... ...of embedded systems and ML to enable rich,... ...embedded hardware. Evaluate and adapt larger ML models... ...cycle: system design, data collection & curation,... ...processing and/or time-series analysis for sensor data. ~ Excellent...DataFull timeContract workFlexible hours
$224k - $313.5k
Strategy Director - Hotels.com Hotels.com is one of Expedia... ...positioning Evaluate strategic Business Travel... ...tradeoffs, and develop clear, data‑backed business cases... ...processes Performance Metrics: Regularly evaluate... ...business case leveraging analysis in an ambiguous and fast...DataTemporary work$136k - $184k
...forecast modeling engine. As we expand... ...a strong Data Scientist II to... ...the technical strategy for these advanced... ...learning, time series analysis, econometrics)... ...• Develop metrics to quantify the... ...working with or evaluating AI systems experience... ...in a ML or data scientist...DataFlexible hours$136k - $184k
...of scientists, engineers, and product managers... ...massive-scale data, measuring the... ...shaping product strategy through rigorous analysis? Do you want to... ...seller pain points, evaluate feature... ...KPI development, metric integrity validation... ...Experience in a ML or data scientist...DataFlexible hours
Do you want to receive more vacancies?
Subscribe and receive similar vacancies to ML Engineer - Evaluation Analysis, Metric and Data Strategy. Be the first to apply!
- machine learning ai engineer Seattle, WA
- machine learning engineer Seattle, WA
- junior machine learning research engineer Seattle, WA
- machine learning software engineer Seattle, WA
- ai ml engineer Seattle, WA
- senior ml engineer Seattle, WA
- graduate machine learning engineer Seattle, WA
- computer vision machine learning engineer Seattle, WA
- data scientist machine learning engineer Seattle, WA
- clinical data Seattle, WA

