Measurement Scientist, AI Evaluation Platform
$171.6k - $258.1kApple Oakbrook
Measurement Scientist, AI Evaluation Platform
Apple is where individual imaginations gather together, committing to the values that lead to great work. Every new product we build, service we create, or App Store experience we deliver is the result of us making each other's ideas stronger. That happens because every one of us shares a belief that we can make something wonderful and share it with the world, changing lives for the better. It's the diversity of our people and their thinking that inspires the innovation that runs through everything we do. When we bring everybody in, we can do the best work of our lives. Here, you'll do more than join something — you'll add something.
Our team, part of Apple Services Engineering, is building the scientific foundation for how AI systems are evaluated across Apple. We are seeking a Measurement Scientist to ensure that our evaluation methods are not just sophisticated, but scientifically valid and trustworthy. In this role, you will apply psychometric theory, validity frameworks, and statistical rigor to establish measurement standards for AI evaluation — ensuring that when we claim an evaluator measures "helpfulness" or "safety," it actually does. We are looking for individuals across a range of experience levels. This role uniquely bridges measurement science and cutting-edge AI evaluation. You will develop methods for validating LLM-as-judge evaluators, automated benchmarks, and human evaluations. And you will create statistical tools that help engineers trust their evaluation results. You will work on an interdisciplinary team with ML researchers to solve new problems in AI evaluation. Your work will be both published at top measurement and ML venues and productionized into the evaluation SDK used across Apple. The successful candidate will have deep expertise in psychometrics and measurement theory, with the ability to apply these principles to novel AI evaluation challenges. You will work collaboratively with ML researchers, platform engineers, and evaluation practitioners to translate measurement science into practical tools that scale across the organization.
Responsibilities
- Design validity frameworks for AI evaluation, ensuring that automated metrics, LLM-as-judge systems, and human evaluation protocols measure what they claim to measure across diverse contexts.
- Develop and apply psychometric methods to assessing the quality of benchmarks, for example drawing on frameworks like item response theory (IRT)
- Create calibration and bias detection systems for automated evaluators, ensuring LLM-as-judge scores are interpretable, consistent, and free from systematic biases.
- Build robust statistical tools for practitioners for sample-size planning, quantifying uncertainty, controlling error rates, and visualizing data.
- Establish measurement standards for evaluator transfer and generalization, including methods to quantify or predict when evaluators will maintain validity across domains, languages, or contexts.
- Validate novel evaluation methods in collaboration with ML researchers, ensuring intelligent search algorithms discover statistically meaningful patterns and synthetic data generation produces representative samples.
- Collaborate with platform engineers to productionize measurement methods into evaluation infrastructure, creating self-service tools for validity checking, reliability testing, and interpretable outputs (report cards, warnings, confidence metrics).
- Publish research at top measurement venues and/or ML conferences (NeurIPS, ICML, ICLR), advancing both measurement science and AI evaluation.
- Collaborate across disciplines with ML researchers developing novel methods, platform engineers building scalable infrastructure, and evaluation practitioners using these tools in production.
Minimum Qualifications
- PhD in Psychometrics, Educational Measurement, Quantitative Psychology, Statistics, or equivalent research/work experience.
- Deep expertise in modeling test data (IRT or related methods) and construct validation.
- Strong statistical foundation including experimental design, power analysis, sampling theory, and uncertainty quantification.
- Track record of designing and validating measurement instruments as demonstrated through publications or applied work.
- Proficiency in Python (preferred) or R for statistical analysis, psychometric modeling, and method implementation.
- Strong working knowledge of generative AI technology
- Excellent communication skills with the ability to explain complex measurement concepts to engineers, ML researchers, and non-technical stakeholders.
Preferred Qualifications
- Experience applying measurement science to AI/ML evaluation, automated scoring systems, or computational assessment.
- Knowledge of modern ML evaluation challenges including LLM-as-judge, automated metrics, benchmark design, and agentic systems.
- Publications at measurement venues or top ML conferences (NeurIPS, ICML, ICLR).
- Expertise in computational social or behavioral science using generative AI
- Experience collaborating with engineers to turn research methods into production tools and scalable infrastructure.
Pay & Benefits
At Apple, base pay is one part of our total compensation package and is determined within a range. This provides the opportunity to progress as you grow and develop within a role. The base pay range for this role is between $171,600 and $258,100, and your base pay will depend on your skills, qualifications, experience, and location. Apple employees also have the opportunity to become an Apple shareholder through participation in Apple's discretionary employee stock programs. Apple employees are eligible for discretionary restricted stock unit awards, and can purchase Apple stock at a discount if voluntarily participating in Apple's Employee Stock Purchase Plan. You'll also receive benefits including: Comprehensive medical and dental coverage, retirement benefits, a range of discounted products and free services, and for formal education related to advancing your career at Apple, reimbursement for certain educational expenses — including tuition. Additionally, this role might be eligible for discretionary bonuses or commission payments as well as relocation. Learn more about Apple Benefits Note: Apple benefit, compensation and employee stock programs are subject to eligibility requirements and other terms of the applicable plan or program. Apple is an equal opportunity employer that is committed to inclusion and diversity. We seek to promote equal opportunity for all applicants without regard to race, color, religion, sex, sexual orientation, gender identity, national origin, disability, Veteran status, or other legally protected characteristics. Learn more about your EEO rights as an applicant At Apple, we believe accessibility is a fundamental human right. You'll find that idea reflected in everything here — in our culture, our benefits and our digital tools. By welcoming as many perspectives as possible, we help you build a career where you feel like you belong. Learn about accessibility in Apple's workplace Learn about reasonable accommodations for job applicants Apple accepts applications to this posting on an ongoing basis. Submit Resume Back to search results See all roles in Seattle
- Research Scientist, LLM Evaluation & Post-Training page is loaded## Research Scientist... ...**Centific is a frontier AI data foundry that curates... ...purpose-built technology platforms to empower the Magnificent... ...of how evaluation design, measurement strategy, and feedback signals...PlatformFull timeRemote work
$137.5k
...ScientistII I (Multi-Product AI) Introduction to Team The... ...our tools, processes, and platforms for building and deploying... ...managers, engineers, and scientists to achieve Expedia’s... ...tests and offline/online evaluation frameworks to measure model impact and guide iterative...PlatformLocal areaFlexible hours$159k - $231k
...Researcher, Analytics, Insights and Measurement Google - Mountain View, CA,... ...customers leverage Google’s AI capabilities to organize,... ...and trustworthy experiences in platforms like Google Analytics and... ...workflows. Design and execute evaluative research to deliver insights...PlatformFull timeTemporary work$140k - $210k
...Ventures collaborate to advance AI, social media, andinvestment... ...of next-generation AI agent platforms. You will lead efforts to... ...applied science team, mentor scientists, and partner closely with engineering... ...and oversee experiments to evaluate model performance,...PlatformFull timeWork visa$92.25k - $114.1k
...posted here as they become available. Scientist I - ML/AI algorithms for Multimodal Foundational... ..., spatial, and protein information) Evaluate ML architectures able to characterize... .../ML custom libraries, AI/ML execution platforms Strong publication track record Proven...PlatformWork at officeLocal areaRemote workVisa sponsorshipRelocation package$139.5k - $258.1k
Senior Applied Scientist - AI Evaluation & Quality Systems Seattle, Washington, United States Machine Learning and AI Apple Services Engineering (ASE) powers the AI and LLM features behind experiences that hundreds of millions of users love every day. As these systems...RelocationShift work$164k - $313.3k
...areas such as Artificial Intelligence (AI), ML systems, and computer vision. Strong... ...engineering decisions based on rigorous measurement and benchmarking, with a focus on improving... ...everyone to create through innovative platforms and tools that unleash creativity,...PlatformTemporary workLocal areaWorldwide$155k - $170k
Overview Senior Clinical Data Scientist - Applied Intelligence... ...first health provider led data platform with a vision of Saving Lives... ...reliably consume. Generative AI reasoning: Apply an understanding... ...systems: Experience designing, evaluating, or contributing to decision...PlatformFor contractorsVisa sponsorshipWork visaFlexible hours- ...network Join our Physicist Expert Network to connect with leading AI labs and companies seeking your expertise. This is an open... ...responsibilities Experts in our network contribute to Training and evaluating AI models in physics Creating tasks and deliverables based on real...Contract workRemote work
- ...Principal Machine Learning Scientist, GenAI This role... ...& Partner Service Platform (TPSP) Product & Technology... ...of human support and AI-powered experiences.... ...is designed, governed, measured, and scaled across... ...experiment design and evaluation, including A/B tests and...Platform
- ...Aerospace VTI Aerospace builds AI-powered perception and pilot... ...As a Senior Software Engineer Evaluation, you will design and implement systems that measure and monitor the performance of our... ...tools using Grafana or similar platforms Why You'll Love Working Here...Platform
$250k - $300k
Principal AI Researcher (Agentic Systems & AI Infrastructure)... ...optimizing AI in the enterprise. Our platform specializes in bridging the “... ...and model behavior shaping evaluation frameworks for autonomous... ...‑grade systems with measurable impact. Drive original research...PlatformRemote workShift work$137.5k
...employees. A singular technology platform powered by data and machine... ...forefront of innovation in AI-driven agentic systems. We're... ...will: Design, build, and evaluate multi-step agentic AI systems... ...for agentic systems, measuring task success, reliability, latency...PlatformLocal areaWorldwideFlexible hours$171.6k - $302.2k
...Staff Applied ML Engineer – AI/ML Evaluation & Simulation We're... ...product to make AI systems more measurable, testable, and trustworthy.... ...contribute to building scalable platforms for simulation and behavior... ...Collaborate with scientists and engineers to instrument...PlatformRelocation- ...A leading AI research accelerator is seeking candidates with a solid foundation in materials science or similar fields for projects aimed at fine-tuning language models. In this role, you will solve complex problems, create clear solutions, and work closely with LLM researchers...For contractorsRemote work
- Senior Machine Learning Scientist Introduction to the... ...A singular technology platform powered by data and machine... ...of innovation in AI‑driven agentic systems... ...: Design, build, and evaluate multi‑step agentic AI... ...for agentic systems, measuring task success, reliability...PlatformWorldwideFlexible hours
$142.7k - $270.95k
The Opportunity Adobe is seeking to add Applied Scientists in Generative AI to our world‑class AI Platform team. We are specifically looking for scientists with expertise in preparing data, training, fine‑tuning and adapting large foundation models across all modalities...PlatformTemporary workLocal area- ...lead the technical foundation of a new AI ROI Measurement venture currently in build at our... ...vendor-neutral measurement and governance platform that captures AI activity wherever it... ...agents at enterprise scale; applied LLM evaluation experience; and familiarity with EU AI...Platform
$124.9k - $228.9k
Data scientists at TTD work closely with engineering throughout the lifecycle... ...and performance on our platform. The main job directions include... ...success metrics, conduct offline evaluation and online experiments (A/B testing), and measure business impact. Communicate technical...PlatformFull timeTemporary work$173k
Senior Machine Learning Scientist The Senior Machine... ...responsible for building and evaluating GenAI‑ and LLM‑powered solutions and AI agents that improve... ...testing) to accurately measure impact and guide decision... ...cloud‑based data/compute platforms and modern data/ML...PlatformFlexible hours$175k - $245k
...Senior Software Engineer II - Applied AI and Evaluations (Remote Eligible) -REMOTE, USA- For over... ...SmartAssist, our intelligent agent platform. As we scale from early demos to production... ...that catch regressions, and drive measurable improvements across our orchestrator and...PlatformFull timeTemporary workLocal areaImmediate startRemote work- Associate Scientist, mRNA Platform Analytical Development Job Title: Associate Scientist, mRNA Platform... ...developing analytical methods for measuring mRNA impurities, quality attributes,... ...molecules Assist in the design, and evaluate the performance, of new analytical development...Platform
$171.6k - $258.1k
...Software and Services AI systems are only as... ...as the methods used to evaluate them. At Apple, where... ...between our science, platform engineering, and partner... ...operationalize sophisticated measurement techniques. You will... ...closely with applied scientists and engineers to...PlatformRelocation$201.3k - $302.2k
Staff Machine Learning Platform Engineer, AI Evaluation Seattle, Washington, United States Machine Learning... ...to operationalize sophisticated measurement techniques, ensuring they scale... ...Lead the technical partnership with scientists to translate novel metrics, judge prompts...PlatformRelocation- ...Engineer in Seattle, WA, to drive solutions and adoption strategies in AI evaluation systems. This hybrid role connects engineering and research teams, transforming workflows into intuitive platforms. Applicants should have a strong background in solutions architecture...Platform
$184.5k
...employees. A singular technology platform powered by data and machine... ...ad relevance, incrementality measurement, and marketplace optimization... ...ads are ranked, priced, and evaluated in production.You will work... ...complex questionsFamiliarity with AI-driven systems, tools, or...PlatformLocal areaFlexible hours$171.6k - $302.2k
...software can enable OS developers to evaluate and improve their features,... ...team builds an end-to-end platform spanning the OS, data, and server... ...to accelerate the adoption of AI and an experimentation culture... ...mission of Evaluation is to measure, understand, and proactively...PlatformWork experience placementRelocation$139.5k - $258.1k
...AIML - Backend Engineer, Evaluation Would you like a... ...products across all Apple platforms. Description As... ...to efficiently assess AI/ML-powered features.... .... You will build, measure, and optimize software... ...feature owners, data scientists, frontend engineers, product...PlatformRelocation- ...leading technology company is seeking a Backend Engineer for their AIML Evaluation team in Seattle. You will design and integrate scalable architectures while developing core APIs to enhance Apple's AI/ML products. The role requires a strong background in backend...Platform
- ...Consulting is looking for a Senior Data Architect to enhance AI systems' reasoning in data environments. This role... ...in data architecture and a strong grasp of cloud data platforms such as AWS and Azure. You will evaluate AI-generated content, ensuring it adheres to...PlatformRemote work
Do you want to receive more vacancies?
Subscribe and receive similar vacancies to Measurement Scientist, AI Evaluation Platform. Be the first to apply!
- principal applied scientist Seattle, WA
- image scientist Seattle, WA
- scientist Seattle, WA
- research scientist machine learning deep learning Seattle, WA
- deep learning scientist Seattle, WA
- senior principal scientist Seattle, WA
- machine learning scientist Seattle, WA
- scientist immunology Seattle, WA
- safety scientist Seattle, WA
- cell culture scientist Seattle, WA

