Sign up to access all features of our service.
  • Job search
  • Favorites
  • Create a CV
    New
  • Salaries
  • Subscriptions

Annotation Data Scientist, Evaluation Integrity (Siri)

$154.6k - $274.9k

Apple Oakbrook

Annotation Data Scientist, Evaluation Integrity (Siri) Cambridge, Massachusetts, United States — Machine Learning and AI Play a part in the ongoing revolution in human-computer interaction. Siri is evolving — and the way we evaluate it has to evolve with it. Join the Evaluation Integrity team to help build the trusted quality signal behind every Siri release. Within the Siri evaluation organization, the Human Evaluation sub‑team is responsible for answering the question: can we trust our evals? We do that by designing human‑in‑the‑loop (HITL) annotation tasks that scrutinize every moving part of an agentic evaluation — the simulated user agent, the conversation it has with Siri, and the automated evaluators that grade the exchange. This role sits at the intersection of data science, human annotation engineering, and evaluation methodology, and is instrumental in turning human judgment into a rigorous, reproducible signal that directly informs pre‑ship model and product decisions. Description As an Annotation Data Scientist on the Evaluation Integrity team, you will design and run HITL annotation projects that evaluate the quality and authenticity of agentic user personae, the validity of agent‑to‑agent conversations, and the reliability of LLM‑as‑judge and rule‑based evaluators against Siri's product specifications. You will own annotation initiatives end‑to‑end; from rubric design and tooling, through annotator calibration, to data science analysis that turns annotator judgments into actionable signal for modeling, planning, and product teams. Responsibilities Design HITL annotation tasks for agentic evaluation. Advise on rubrics and design workflows that ask annotators to assess (a) the quality and authenticity of user agent personae, (b) the validity of agent‑to‑agent conversations, and (c) whether agentic evaluators' verdicts align with Siri's product specifications and human interface guidelines. Author, maintain, and iterate on annotation guidelines. Translate evolving Siri capabilities and product specs into clear, defensible rubrics for human grading aligned with agentic evaluators; run calibration sessions; monitor inter‑annotator agreement; and refine guidelines based on edge cases surfaced during grading. Manage multiple annotation programs in parallel. Plan, scope, and manage human evaluation tasks end‑to‑end — requirements gathering, annotator coordination, vendor management, timeline tracking, and stakeholder delivery. Design custom annotation tooling in partnership with software engineers. Prototype task UIs, specify tool requirements, and collaborate with tooling engineers on the annotation platforms the Human Evaluation team relies on. Apply data science rigor to human‑labeled data. Use Python to build analysis pipelines that measure evaluator accuracy against the annotator pool, surface discrepancies between LLM‑judge and rule‑based evaluators, and quantify the reliability of each agentic evaluator as a source of truth. Turn annotator feedback into evaluator improvements. Close the loop between annotators and the data scientists and software engineers who own user agents and automated evaluators, feeding findings back into prompts, rubrics, and product guidelines. Contribute to the organization‑wide eval health story. Partner with the User Feedback and Eval Science sub‑team to ensure human signal is represented in the eval health report delivered to leadership. Minimum Qualifications Bachelor's or Master's degree in a quantitative or related field such as Data Science, Computer Science, Linguistics, Statistics, or Cognitive Science, or equivalent job‑related experience. 3+ years of hands‑on experience working with human‑annotated datasets or human‑in‑the‑loop evaluation methodologies for machine learning, natural language processing, or large language model systems. 3+ years of experience using Python for data processing, analysis, and prototyping, including experience with libraries such as pandas, Jupyter, and at least one data visualization library. Experience designing, implementing, and communicating annotation schemas, rubrics, or ontologies for machine learning training or evaluation data. Experience managing multiple concurrent dataset curation efforts, including scoping work, iterating on guidelines, coordinating with in‑house or vendor annotators, and monitoring annotator performance metrics such as accuracy, throughput, and inter‑annotator agreement. Experience specifying or designing custom annotation tooling in collaboration with software engineers. Preferred Qualifications Experience evaluating LLM‑powered or agentic systems, including familiarity with LLM‑as‑judge methodologies, rubric‑based grading, or trajectory and tool‑call evaluation. Familiarity with statistical methods that address accuracy and variability in human annotation data, such as inter‑annotator agreement, Cohen's or Fleiss' kappa, Krippendorff's alpha, or bootstrapping. Data‑querying experience with SQL, Spark, or similar, and comfort working with large, complex, real‑world datasets. Experience building pre‑ship evaluation pipelines for conversational or assistant products. Experience with prompt engineering, or with designing simulated user personae for agent evaluation. Experience running annotation programs across multiple locales or at large scale. Excellent written and verbal communication skills, with the ability to explain technical topics clearly to data scientists, engineers, annotators, and cross‑functional partners. Proven ability to collaborate effectively across functions and drive projects of varying sizes and scopes — knowing when to dive deep and when to delegate. At Apple, base pay is one part of our total compensation package and is determined within a range. This provides the opportunity to progress as you grow and develop within a role. The base pay range for this role is between $154,600 and $274,900, and your base pay will depend on your skills, qualifications, experience, and location. Apple employees also have the opportunity to become an Apple shareholder through participation in Apple’s discretionary employee stock programs. Apple employees are eligible for discretionary restricted stock unit awards, and can purchase Apple stock at a discount if voluntarily participating in Apple’s Employee Stock Purchase Plan. You’ll also receive benefits including comprehensive medical and dental coverage, retirement benefits, a range of discounted products and free services, and for formal education related to advancing your career at Apple, reimbursement for certain educational expenses — including tuition. Additionally, this role might be eligible for discretionary bonuses or commission payments as well as relocation. Note: Apple benefit, compensation and employee stock programs are subject to eligibility requirements and other terms of the applicable plan or program. Apple is an equal opportunity employer that is committed to inclusion and diversity. We seek to promote equal opportunity for all applicants without regard to race, color, religion, sex, sexual orientation, gender identity, national origin, disability, Veteran status, or other legally protected characteristics. Learn more about your EEO rights as an applicant. At Apple, we believe accessibility is a fundamental human right. You’ll find that idea reflected in everything here — in our culture, our benefits and our digital tools. By welcoming as many perspectives as possible, we help you build a career where you feel like you belong. Learn about accessibility in Apple’s workplace. Learn about reasonable accommodations for job applicants. #J-18808-Ljbffr

Vacancy posted 2 days ago
Similar jobs that could be interesting for youBased on the Annotation Data Scientist, Evaluation Integrity (Siri) in Cambridge, MA vacancy
  • $154.6k - $274.9k

     ...Annotation Data Scientist, Evaluation Integrity (Siri) Play a part in the ongoing revolution in human-computer interaction. Siri is evolving — and the way we evaluate it has to evolve with it. Join the Evaluation Integrity team to help build the trusted quality signal... 
    Suggested
    Relocation

    Apple

    Cambridge, MA
    5 days ago
  •  ...Apple Inc. is seeking an Annotation Data Scientist for the Evaluation Integrity team in Cambridge, Massachusetts. This role focuses on designing human-in-the-...  ...HITL) annotation projects that evaluate the quality of Siri interactions and systems. The ideal candidate will... 
    Suggested

    Apple

    Cambridge, MA
    2 days ago
  •  ...Overview: Responsibilities: We are seeking an experienced Data Integration Developer to design, develop, and support custom database and ETL applications for Global Sales & Marketing within the Global Distribution team. This role involves working independently... 
    Suggested

    Purple Drive

    Boston, MA
    6 days ago
  • $36 - $41 per hour

     ...A leading global biopharmaceutical firm is looking for an AI Data Integration Engineer (RAG Systems). This hybrid position involves developing AI-enabled assistants and supporting engineering workflows. Candidates should have experience in data engineering, awareness... 
    Suggested
    Permanent employment
    Contract work

    3 Key Consulting

    Cambridge, MA
    1 day ago
  •  ...Job Title: AI Data Integration Engineer (RAG Systems) - Hybrid(JP15177) Location: Cambridge, MA 02138 Employment Type: Contract Business Unit: ETA Drug Substance Duration: 1+ year with likely extensions and/or conversion to permanent Posting Date: 4/1/2026 Pay Rate: $... 
    Suggested
    Permanent employment
    Contract work
    Work at office
    Local area

    3 Key Consulting

    Cambridge, MA
    2 days ago
  •  ...We are seeking a highly motivated and experienced Data Integration Developer to support Global Distribution projects, with a primary focus on Global Sales & Marketing systems. This role involves developing and supporting custom database and ETL applications, collaborating... 

    Compunnel

    Boston, MA
    2 days ago
  •  ...A technology solutions company is seeking a highly motivated Data Integration Developer to support Global Sales and Marketing systems. This role involves developing custom database and ETL applications, collaborating with users, and ensuring compliance with regulatory... 

    Compunnel

    Boston, MA
    2 days ago
  •  ...Integration (Data) Engineer Location: Mexico Duration: 3+ months About Job: We are seeking a talented Integration Engineer with a strong background in data integration, ETL (Extract, Transform, Load) processes, and data pipelines. The Integration Engineer will... 

    Saviance

    Boston, MA
    6 days ago
  • $110k - $130k

     ...meaningful work and want to be part of something bigger than yourself, Caris is where your impact begins. Position Summary The Data Integration Engineer will install back-end, automated data integrations to customer electronic medical records, billing, data warehouses... 
    Work experience placement
    Work at office
    Remote work
    Flexible hours
    Afternoon shift

    Caris Life Sciences

    Boston, MA
    2 days ago
  • $175k - $200k

     ...Data Integration Engineer - Healthcare Startup Boston, Massachusetts, United States $ 175,000.00 - 200,000.00 (US Dollar) Our client is a venture-backed company that has created a cutting-edge system designed to enhance the analysis and understanding of electronic... 
    H1b
    Flexible hours

    Tech Stars Group LLC

    Boston, MA
    4 days ago
  • $129k - $209k

     ...Join Evolv as Senior Data Infrastructure Engineer...  ...model training, evaluation, and continuous improvement...  ...Functions, SageMaker integrations). Introduce...  ...labeling services and annotation workflows. Enable...  ...with AI/ML engineers, scientists, and data scientists... 
    Full time
    Work at office
    Flexible hours
    3 days per week

    Evolv Technology

    Watertown, MA
    7 days ago
  •  ...About the Role We're looking for a Data Scientist to own the quality, reliability, and trustworthiness of our clinical AI outputs. You'll...  ...systems that ensure our AI "knows what it doesn't know"—developing evaluation frameworks, calibrated confidence scoring, and automated... 

    Bioscope.ai, Inc.

    Boston, MA
    1 day ago
  • $190k - $258k

     ...Data Scientist We are seeking an experienced and highly skilled data...  ...for training and evaluation data powering the perception...  ...Determine trade-offs and integrations between human-labeled, human...  ...data collection, including annotation task design $190,000 -... 
    Temporary work
    Relocation package

    Zoox

    Boston, MA
    4 days ago
  •  ...specialty firms. IRI has built its reputation on excellent service and integrity since its inception in 1996. Our mission centers on delivering...  ..., Rehabilitation Therapy and Nursing. Job Description Title: Data Warehouse Specialist I Location: BOSTON, MA Duration: 6 Months... 

    Integrated Resources

    Boston, MA
    3 days ago
  •  ...Boston, MA. The ideal candidate should have over 5 years of experience in software development and integration, particularly with APIs and SQL. Experience in healthcare data management and familiarity with MDM platforms like IBM Infosphere are preferred. Strong... 

    Polarits

    Boston, MA
    2 days ago
  •  ...Manufacturing, Laboratory, and Enterprise platforms. This role will collaborate with cross-functional teams to ensure compliance with data integrity, validation, regulatory, and quality requirements throughout the system lifecycle, including implementation, enhancements,... 

    Creative Solutions Services, LLC

    Boston, MA
    4 days ago
  •  ...manufacturing, and quality by delivering data-driven, scalable, and compliant software...  ...with the ability to work directly with scientists performing assay-based experiments. The...  ...robust data components, scientific system integrations, AI-enabled insights, and next-... 
    Relocation
    Flexible hours

    Zifo

    Boston, MA
    7 days ago
  •  ...4 This is hybrid from day-1 Description: Overview We are looking to add talented informatica (IDMC) data integration engineers to our high-performing team to augment our collective efforts on a high visibility team Qualifications ~... 

    ShiftCode Analytics

    Boston, MA
    2 days ago
  •  ...Engineering team in Boston, MA. This role involves managing a team of engineers and overseeing the technical foundation for data, analytics, APIs, and integrations. The ideal candidate will have strong technical depth, a proven track record in team leadership, and the ability to... 

    ABCorp NA Inc.

    Boston, MA
    1 day ago
  • $120k - $130k

     ...industry expertise and unmatched data resources, Shift provides...  ...consisting of over 200+ Data Scientists throughout the world. Our...  ...the next generation of payment integrity solutions. Create custom "...  ...data. Establish rigorous evaluation frameworks (LLM-as-a-judge)... 
    Permanent employment
    Full time
    Apprenticeship
    Internship
    Remote work
    Flexible hours
    Shift work

    Shift Technology

    Boston, MA
    19 days ago
  • $110k - $130k

     ...looking for a Clinical Data Engineer who will own...  ...research associates and data scientists. You will operate at...  ...Ensure data integrity, reproducibility, and...  ...behavioral datasets to evaluate product performance and...  ...applications for visualizing and annotating biometric data... 
    Full time
    Immediate start
    Worldwide
    Flexible hours

    Eight Sleep

    Boston, MA
    16 hours ago
  • $176k - $240k

    As a Data Scientist on the Behavior Evaluation team, you will be the statistical anchor ensuring our autonomous driving systems navigate highway environments with world-class safety, efficiency, and comfort. Highway evaluation presents a unique industry challenge: verifying... 
    Full time
    Temporary work
    Relocation package

    Zoox

    Boston, MA
    9 hours ago
  •  ...March 2026, we announced positive topline data from our Phase 3 X-TOLE2 study of...  ...growth plans, we continue to build a fully integrated, premier neuroscience company with strong...  ...consultation with assigned stakeholders, evaluate, select, implement, and govern integration... 
    Temporary work
    Work at office
    2 days per week

    Xenon Pharmaceuticals Inc.

    Needham Heights, MA
    3 days ago
  • $152.6k - $190.7k

     ...Empowers you to Build your Future As a Lead Data Scientist at Lennar, you will design, build, and...  ..., model and agent development, evaluation, production deployment, and continuous...  ...ensuring scalability and reliability. Integrate agents with enterprise systems and protocols... 
    Work at office
    Local area

    Lennar Homes

    Boston, MA
    3 days ago
  •  ...using Informatica PowerCenter, Informatica Data Quality, and Informatica Master Data...  ...database changes. Meets with vendors, evaluates products, and makes recommendations...  ...Informatica PowerCenter or equivalent data integration software. Development experience with... 

    CERES Group

    Boston, MA
    3 days ago
  • $160k - $220k

     ...Lead Data Engineer Deliberate AI | Hybrid (NYC or Boston) | Full-Time About Deliberate...  ...signal processing and wearable API integrations — and you understand that both feed into...  ...time zones and connectivity conditions Evaluate and select the core data stack —... 
    Full time
    Worldwide
    Relocation
    Flexible hours
    Shift work
    Night shift
    Day shift

    Deliberate AI

    Boston, MA
    4 days ago
  • $130k - $185k

     ...Summary Xiphos Partners is seeking a versatile AI Integration Engineer to work at the intersection of data science, software engineering, and applied artificial...  ...engineering, model selection, training, and evaluation -against business KPIs and mission outcomes. Establish... 

    Xiphos Partners, Inc

    Cambridge, MA
    2 days ago
  • $142.3k - $195.7k

     ..., and control to develop and evaluate agents that can set, adapt, and...  ...Work across teams to integrate goal alignment with safety, alignment...  ...in SQL, Python, and data analysis/data mining tools....  ...engineering or an applied research scientist position preferably with a focus... 
    Bi-weekly pay
    Full time
    Temporary work
    Apprenticeship
    Work at office
    Remote work
    Work from home
    Home office

    Humana

    Boston, MA
    2 days ago
  • $77k - $202k

     ...Specialty/Competency: Data, Analytics & AI Industry/Sector: Not Applicable Time...  ...and implementing data pipelines, data integration, and data transformation solutions....  ...collaborating closely with team members. We evaluate these factors thoughtfully to establish... 
    Full time
    H1b

    PwC

    Boston, MA
    3 days ago
  • $155k - $410k

     ...Requirements: Up to 100% At PwC, our people in integration and platform architecture focus on...  ...for clients. They enable efficient data flow and optimise technology infrastructure...  ...closely with team members. We evaluate these factors thoughtfully to establish... 
    Full time
    Temporary work
    Work experience placement
    H1b

    PwC

    Boston, MA
    6 days ago

Do you want to receive more vacancies?

Subscribe and receive similar vacancies to Annotation Data Scientist, Evaluation Integrity (Siri). Be the first to apply!