Measurement Scientist, AI Evaluation Platform

$171.6k - $258.1k

Apple Oakbrook

Measurement Scientist, AI Evaluation Platform

Apple is where individual imaginations gather together, committing to the values that lead to great work. Every new product we build, service we create, or App Store experience we deliver is the result of us making each other's ideas stronger. That happens because every one of us shares a belief that we can make something wonderful and share it with the world, changing lives for the better. It's the diversity of our people and their thinking that inspires the innovation that runs through everything we do. When we bring everybody in, we can do the best work of our lives. Here, you'll do more than join something — you'll add something.

Our team, part of Apple Services Engineering, is building the scientific foundation for how AI systems are evaluated across Apple. We are seeking a Measurement Scientist to ensure that our evaluation methods are not just sophisticated, but scientifically valid and trustworthy. In this role, you will apply psychometric theory, validity frameworks, and statistical rigor to establish measurement standards for AI evaluation — ensuring that when we claim an evaluator measures "helpfulness" or "safety," it actually does. We are looking for individuals across a range of experience levels. This role uniquely bridges measurement science and cutting-edge AI evaluation. You will develop methods for validating LLM-as-judge evaluators, automated benchmarks, and human evaluations. And you will create statistical tools that help engineers trust their evaluation results. You will work on an interdisciplinary team with ML researchers to solve new problems in AI evaluation. Your work will be both published at top measurement and ML venues and productionized into the evaluation SDK used across Apple. The successful candidate will have deep expertise in psychometrics and measurement theory, with the ability to apply these principles to novel AI evaluation challenges. You will work collaboratively with ML researchers, platform engineers, and evaluation practitioners to translate measurement science into practical tools that scale across the organization.

Responsibilities

Design validity frameworks for AI evaluation, ensuring that automated metrics, LLM-as-judge systems, and human evaluation protocols measure what they claim to measure across diverse contexts.
Develop and apply psychometric methods to assessing the quality of benchmarks, for example drawing on frameworks like item response theory (IRT)
Create calibration and bias detection systems for automated evaluators, ensuring LLM-as-judge scores are interpretable, consistent, and free from systematic biases.
Build robust statistical tools for practitioners for sample-size planning, quantifying uncertainty, controlling error rates, and visualizing data.
Establish measurement standards for evaluator transfer and generalization, including methods to quantify or predict when evaluators will maintain validity across domains, languages, or contexts.
Validate novel evaluation methods in collaboration with ML researchers, ensuring intelligent search algorithms discover statistically meaningful patterns and synthetic data generation produces representative samples.
Collaborate with platform engineers to productionize measurement methods into evaluation infrastructure, creating self-service tools for validity checking, reliability testing, and interpretable outputs (report cards, warnings, confidence metrics).
Publish research at top measurement venues and/or ML conferences (NeurIPS, ICML, ICLR), advancing both measurement science and AI evaluation.
Collaborate across disciplines with ML researchers developing novel methods, platform engineers building scalable infrastructure, and evaluation practitioners using these tools in production.

Minimum Qualifications

PhD in Psychometrics, Educational Measurement, Quantitative Psychology, Statistics, or equivalent research/work experience.
Deep expertise in modeling test data (IRT or related methods) and construct validation.
Strong statistical foundation including experimental design, power analysis, sampling theory, and uncertainty quantification.
Track record of designing and validating measurement instruments as demonstrated through publications or applied work.
Proficiency in Python (preferred) or R for statistical analysis, psychometric modeling, and method implementation.
Strong working knowledge of generative AI technology
Excellent communication skills with the ability to explain complex measurement concepts to engineers, ML researchers, and non-technical stakeholders.

Preferred Qualifications

Experience applying measurement science to AI/ML evaluation, automated scoring systems, or computational assessment.
Knowledge of modern ML evaluation challenges including LLM-as-judge, automated metrics, benchmark design, and agentic systems.
Publications at measurement venues or top ML conferences (NeurIPS, ICML, ICLR).
Expertise in computational social or behavioral science using generative AI
Experience collaborating with engineers to turn research methods into production tools and scalable infrastructure.

Pay & Benefits

At Apple, base pay is one part of our total compensation package and is determined within a range. This provides the opportunity to progress as you grow and develop within a role. The base pay range for this role is between $171,600 and $258,100, and your base pay will depend on your skills, qualifications, experience, and location. Apple employees also have the opportunity to become an Apple shareholder through participation in Apple's discretionary employee stock programs. Apple employees are eligible for discretionary restricted stock unit awards, and can purchase Apple stock at a discount if voluntarily participating in Apple's Employee Stock Purchase Plan. You'll also receive benefits including: Comprehensive medical and dental coverage, retirement benefits, a range of discounted products and free services, and for formal education related to advancing your career at Apple, reimbursement for certain educational expenses — including tuition. Additionally, this role might be eligible for discretionary bonuses or commission payments as well as relocation. Learn more about Apple Benefits Note: Apple benefit, compensation and employee stock programs are subject to eligibility requirements and other terms of the applicable plan or program. Apple is an equal opportunity employer that is committed to inclusion and diversity. We seek to promote equal opportunity for all applicants without regard to race, color, religion, sex, sexual orientation, gender identity, national origin, disability, Veteran status, or other legally protected characteristics. Learn more about your EEO rights as an applicant At Apple, we believe accessibility is a fundamental human right. You'll find that idea reflected in everything here — in our culture, our benefits and our digital tools. By welcoming as many perspectives as possible, we help you build a career where you feel like you belong. Learn about accessibility in Apple's workplace Learn about reasonable accommodations for job applicants Apple accepts applications to this posting on an ongoing basis. Submit Resume Back to search results See all roles in Seattle

Apply

Vacancy posted 2 days ago

Similar jobs that could be interesting for youBased on the Measurement Scientist, AI Evaluation Platform in Seattle, WA vacancy

Research Scientist, LLM Evaluation & Post-Training
Research Scientist, LLM Evaluation & Post-Training page is loaded## Research Scientist... ...**Centific is a frontier AI data foundry that curates... ...purpose-built technology platforms to empower the Magnificent... ...of how evaluation design, measurement strategy, and feedback signals...
Platform
Full time
Remote work
Centific Global Solutions, Inc.
Seattle, WA
2 days ago
Machine Learning Scientist III - Multi-Product AI
$137.5k
...ScientistII I (Multi-Product AI) Introduction to Team The... ...our tools, processes, and platforms for building and deploying... ...managers, engineers, and scientists to achieve Expedia’s... ...tests and offline/online evaluation frameworks to measure model impact and guide iterative...
Platform
Local area
Flexible hours
Expedia, Inc.
Seattle, WA
3 days ago
Senior UX Researcher, Analytics, Insights and Measurement
$159k - $231k
...Researcher, Analytics, Insights and Measurement Google - Mountain View, CA,... ...customers leverage Google’s AI capabilities to organize,... ...and trustworthy experiences in platforms like Google Analytics and... ...workflows. Design and execute evaluative research to deliver insights...
Platform
Full time
Temporary work
Google Inc.
Seattle, WA
4 days ago
NAVER Search US - Senior Applied Scientist (Generative AI & LLMs)
$140k - $210k
...Ventures collaborate to advance AI, social media, andinvestment... ...of next-generation AI agent platforms. You will lead efforts to... ...applied science team, mentor scientists, and partner closely with engineering... ...and oversee experiments to evaluate model performance,...
Platform
Full time
Work visa
Naver U.Hub INC
Bellevue, WA
1 day ago
Scientist I - ML/AI algorithms for Multimodal Foundational Models
$92.25k - $114.1k
...posted here as they become available. Scientist I - ML/AI algorithms for Multimodal Foundational... ..., spatial, and protein information) Evaluate ML architectures able to characterize... .../ML custom libraries, AI/ML execution platforms Strong publication track record Proven...
Platform
Work at office
Local area
Remote work
Visa sponsorship
Relocation package
ClearCompany Talent Management Software
Seattle, WA
4 days ago
Senior Applied Scientist - AI Evaluation & Quality Systems
$139.5k - $258.1k
Senior Applied Scientist - AI Evaluation & Quality Systems Seattle, Washington, United States Machine Learning and AI Apple Services Engineering (ASE) powers the AI and LLM features behind experiences that hundreds of millions of users love every day. As these systems...
Relocation
Shift work
Apple Inc.
Seattle, WA
2 days ago
Senior Applied Scientist - Machine Learning Systems Engineer- Photoshop
$164k - $313.3k
...areas such as Artificial Intelligence (AI), ML systems, and computer vision. Strong... ...engineering decisions based on rigorous measurement and benchmarking, with a focus on improving... ...everyone to create through innovative platforms and tools that unleash creativity,...
Platform
Temporary work
Local area
Worldwide
Adobe
Seattle, WA
22 hours ago
Sr. Clinical Data Scientist - Applied Intelligence Solutions
$155k - $170k
Overview Senior Clinical Data Scientist - Applied Intelligence... ...first health provider led data platform with a vision of Saving Lives... ...reliably consume. Generative AI reasoning: Apply an understanding... ...systems: Experience designing, evaluating, or contributing to decision...
Platform
For contractors
Visa sponsorship
Work visa
Flexible hours
Truveta
Seattle, WA
3 days ago
Remote Physicist AI Trainer & Model Evaluation Specialist
...network Join our Physicist Expert Network to connect with leading AI labs and companies seeking your expertise. This is an open... ...responsibilities Experts in our network contribute to Training and evaluating AI models in physics Creating tasks and deliverables based on real...
Contract work
Remote work
Mercor Inc
Seattle, WA
5 hours ago
Principal ML Scientist
...Principal Machine Learning Scientist, GenAI This role... ...& Partner Service Platform (TPSP) Product & Technology... ...of human support and AI-powered experiences.... ...is designed, governed, measured, and scaled across... ...experiment design and evaluation, including A/B tests and...
Platform
Expedia Group
Seattle, WA
4 days ago
Senior Machine Learning / Data Engineer - Evaluation
...Aerospace VTI Aerospace builds AI-powered perception and pilot... ...As a Senior Software Engineer Evaluation, you will design and implement systems that measure and monitor the performance of our... ...tools using Grafana or similar platforms Why You'll Love Working Here...
Platform
VTI Aerospace
Seattle, WA
1 day ago
Principal AI Researcher (Agentic Systems & AI Infrastructure)
$250k - $300k
Principal AI Researcher (Agentic Systems & AI Infrastructure)... ...optimizing AI in the enterprise. Our platform specializes in bridging the “... ...and model behavior shaping evaluation frameworks for autonomous... ...‑grade systems with measurable impact. Drive original research...
Platform
Remote work
Shift work
Trase Systems
Seattle, WA
2 days ago
Machine Learning Scientist III - Agentic Experience
$137.5k
...employees. A singular technology platform powered by data and machine... ...forefront of innovation in AI-driven agentic systems. We're... ...will: Design, build, and evaluate multi-step agentic AI systems... ...for agentic systems, measuring task success, reliability, latency...
Platform
Local area
Worldwide
Flexible hours
Expedia Group
Seattle, WA
3 days ago
Senior/Staff Applied ML Engineer - AI/ML Evaluation & Simulation
$171.6k - $302.2k
...Staff Applied ML Engineer – AI/ML Evaluation & Simulation We're... ...product to make AI systems more measurable, testable, and trustworthy.... ...contribute to building scalable platforms for simulation and behavior... ...Collaborate with scientists and engineers to instrument...
Platform
Relocation
Apple
Seattle, WA
3 days ago
Remote Materials Scientist - AI Evaluation & LLM Probes
...A leading AI research accelerator is seeking candidates with a solid foundation in materials science or similar fields for projects aimed at fine-tuning language models. In this role, you will solve complex problems, create clear solutions, and work closely with LLM researchers...
For contractors
Remote work
Turing
Seattle, WA
4 days ago
Senior Machine Learning Scientist - Agentic Experience
Senior Machine Learning Scientist Introduction to the... ...A singular technology platform powered by data and machine... ...of innovation in AI‑driven agentic systems... ...: Design, build, and evaluate multi‑step agentic AI... ...for agentic systems, measuring task success, reliability...
Platform
Worldwide
Flexible hours
PowerToFly
Seattle, WA
1 day ago
Sr Applied Scientist, Generative AI/ML
$142.7k - $270.95k
The Opportunity Adobe is seeking to add Applied Scientists in Generative AI to our world‑class AI Platform team. We are specifically looking for scientists with expertise in preparing data, training, fine‑tuning and adapting large foundation models across all modalities...
Platform
Temporary work
Local area
Adobe Systems GmbH
Seattle, WA
3 days ago
Co-founder & Chief Technology Officer - AI ROI Measurement Platform
...lead the technical foundation of a new AI ROI Measurement venture currently in build at our... ...vendor-neutral measurement and governance platform that captures AI activity wherever it... ...agents at enterprise scale; applied LLM evaluation experience; and familiarity with EU AI...
Platform
FutureSight Inc.
Seattle, WA
1 day ago
Senior Applied Scientist , Channel Growth
$124.9k - $228.9k
Data scientists at TTD work closely with engineering throughout the lifecycle... ...and performance on our platform. The main job directions include... ...success metrics, conduct offline evaluation and online experiments (A/B testing), and measure business impact. Communicate technical...
Platform
Full time
Temporary work
The Trade Desk, Inc.
Bellevue, WA
1 day ago
Senior Machine Learning Scientist
$173k
Senior Machine Learning Scientist The Senior Machine... ...responsible for building and evaluating GenAI‑ and LLM‑powered solutions and AI agents that improve... ...testing) to accurately measure impact and guide decision... ...cloud‑based data/compute platforms and modern data/ML...
Platform
Flexible hours
NLP PEOPLE
Seattle, WA
2 days ago
Senior Software Engineer II - Applied AI and Evaluations (Remote Eligible)
$175k - $245k
...Senior Software Engineer II - Applied AI and Evaluations (Remote Eligible) -REMOTE, USA- For over... ...SmartAssist, our intelligent agent platform. As we scale from early demos to production... ...that catch regressions, and drive measurable improvements across our orchestrator and...
Platform
Full time
Temporary work
Local area
Immediate start
Remote work
Smartsheet
Bellevue, WA
1 day ago
Associate Scientist, mRNA Platform Analytical Development
Associate Scientist, mRNA Platform Analytical Development Job Title: Associate Scientist, mRNA Platform... ...developing analytical methods for measuring mRNA impurities, quality attributes,... ...molecules Assist in the design, and evaluate the performance, of new analytical development...
Platform
TechDigital Group
Seattle, WA
22 hours ago
Lead Forward Deployed Engineer, AI Evaluation Platform
$171.6k - $258.1k
...Software and Services AI systems are only as... ...as the methods used to evaluate them. At Apple, where... ...between our science, platform engineering, and partner... ...operationalize sophisticated measurement techniques. You will... ...closely with applied scientists and engineers to...
Platform
Relocation
Apple Inc.
Seattle, WA
4 days ago
Staff Machine Learning Platform Engineer, AI Evaluation
$201.3k - $302.2k
Staff Machine Learning Platform Engineer, AI Evaluation Seattle, Washington, United States Machine Learning... ...to operationalize sophisticated measurement techniques, ensuring they scale... ...Lead the technical partnership with scientists to translate novel metrics, judge prompts...
Platform
Relocation
Apple Inc.
Seattle, WA
4 days ago
Lead AI Evaluation Platform Engineer: Developer-First Scale
...Engineer in Seattle, WA, to drive solutions and adoption strategies in AI evaluation systems. This hybrid role connects engineering and research teams, transforming workflows into intuitive platforms. Applicants should have a strong background in solutions architecture...
Platform
Apple Inc.
Seattle, WA
4 days ago
Senior Economist - Advertising Technology
$184.5k
...employees. A singular technology platform powered by data and machine... ...ad relevance, incrementality measurement, and marketplace optimization... ...ads are ranked, priced, and evaluated in production.You will work... ...complex questionsFamiliarity with AI-driven systems, tools, or...
Platform
Local area
Flexible hours
Expedia Group
Seattle, WA
2 days ago
Senior OS Software Engineer, Evaluation
$171.6k - $302.2k
...software can enable OS developers to evaluate and improve their features,... ...team builds an end-to-end platform spanning the OS, data, and server... ...to accelerate the adoption of AI and an experimentation culture... ...mission of Evaluation is to measure, understand, and proactively...
Platform
Work experience placement
Relocation
Apple Inc.
Seattle, WA
1 day ago
AIML - Backend Engineer, Evaluation
$139.5k - $258.1k
...AIML - Backend Engineer, Evaluation Would you like a... ...products across all Apple platforms. Description As... ...to efficiently assess AI/ML-powered features.... .... You will build, measure, and optimize software... ...feature owners, data scientists, frontend engineers, product...
Platform
Relocation
Apple
Seattle, WA
4 days ago
AIML Backend Engineer: Evaluation Platform Builder
...leading technology company is seeking a Backend Engineer for their AIML Evaluation team in Seattle. You will design and integrate scalable architectures while developing core APIs to enhance Apple's AI/ML products. The role requires a strong background in backend...
Platform
Apple Inc.
Seattle, WA
3 days ago
Senior Data Architect for AI Model Evaluation (Remote)
...Consulting is looking for a Senior Data Architect to enhance AI systems' reasoning in data environments. This role... ...in data architecture and a strong grasp of cloud data platforms such as AWS and Azure. You will evaluate AI-generated content, ensuring it adheres to...
Platform
Remote work
YO IT Consulting
Seattle, WA
14 days ago

Do you want to receive more vacancies?

Subscribe and receive similar vacancies to Measurement Scientist, AI Evaluation Platform. Be the first to apply!