Sign up to access all features of our service.
  • Job search
  • Favorites
  • Create a CV
    New
  • Salaries
  • Subscriptions

Machine Learning Safety: Evaluation Research Engineer

$171.6k - $302.2k

Apple Inc.

Machine Learning Safety: Evaluation Research Engineer Seattle, Washington, United States Machine Learning and AI This role supports the design and development of safety evaluation methodologies for generative and agentic AI features that enable users across the globe to interact with our media products and services. Description You will play an impactful role: shaping responsible AI and safety policies, evaluating fidelity to product safety requirements, creating risk assessments and taxonomies, curating exemplar safety evaluation datasets, and ensuring that evaluation frameworks are culturally and linguistically grounded. An ideal candidate possesses a strong understanding of issues in responsible AI and A and society, technology evaluation design principles and practices, and brings experience designing evaluations to support policies and/or product requirements, classification systems, and annotation and/or study participant guidelines. Responsibilities Taxonomy Development: Design, refine, and maintain safety-relevant taxonomies that capture risk categories, content types, and policy distinctions, achieved through collaborations with subject matter experts who bring knowledge across languages and cultural contexts. You will work collaboratively to ensure taxonomies are comprehensive, internally consistent, and actionable for downstream evaluation work. Policy-to-Data Translation: Develop and validate exemplar sets that illustrate taxonomy categories, edge cases, and boundary conditions. Collaborate with language and cultural experts to ensure exemplars are culturally appropriate and representative across target markets. Partner with policy, product, and engineering teams to translate responsible AI policies and guidelines into concrete data requirements, annotation schemas, and evaluation criteria that can be operationalized across markets. Develop and maintain synthetic data generation pipelines to augment evaluation coverage, stress-test safety boundaries, and support evaluation in low-resource languages. Ensure synthetic data is diverse, representative, and validated against human-generated benchmarks. Automated Judge Development: Shape the development, training and fine‑tuning, and validation of automated judge models that can reliably score AI system outputs for safety and policy compliance. Develop calibration and agreement metrics to ensure judges meet human‑parity benchmarks. Design and implement validation frameworks to assess the accuracy, reliability, and consistency of automated evaluation systems. Develop methods to detect drift, bias, and failure modes in automated judges across markets. Scalable Analysis & Reporting Automation: Create automated pipelines for analysis and reporting that reduce manual effort, increase reproducibility, and enable rapid cross‑market safety assessments. Build tooling that integrates with existing dashboards and reporting workflows. Documentation & Communication: Produce clear, detailed documentation artifacts. Present findings and recommendations to cross‑functional stakeholders including engineering, product, compliance, and policy teams. Canonical Guideline Development: Author and maintain canonical evaluation guidelines that standardize task definitions, rating criteria, and edge‑case handling. These assets will be adapted to scale across languages and markets, with the support of multi‑lingual and operations experts. You will ensure guidelines are clear, complete, and adaptable. Evaluation Design & Execution: Pilot and run evaluations with validated task setups, manage evaluation instruments and surface issues before full‑scale deployment. Analyze pilot results and iterate on guidelines and configurations accordingly. esign and run pilot evaluations to validate task setups, identify guideline ambiguities, calibrate annotator understanding, and surface issues before full‑scale deployment. Analyze pilot results and iterate on guidelines and configurations accordingly. Monitoring & Data Quality: Develop and implement monitoring frameworks to track evaluation progress, annotator performance, inter‑rater agreement, and data quality in real time. Flag anomalies and implement corrective actions to maintain data integrity across markets. Minimum Qualifications 4+ years of experience in an applied research setting related to evaluation design, AI ethics, Responsible AI, AI safety, computational social science, content analysis, or a closely related field. Strong understanding of taxonomy design, classification systems, and annotation methodology. Experience developing evaluation guidelines and exemplar sets for human annotation or labeling tasks. Demonstrated ability to collaborate with subject matter experts (e.g., linguists, cultural consultants, multi‑lingual annotators) to inform research design. Able to work independently to drive outcomes among cross‑functional teams, with minimal direction. Organized, highly attentive to detail, and manages time well. Excellent written and oral communication skills. Experience working in industry. Advanced degree (MS/PhD) in Linguistics, Information Science, Computational Social Science, or a related socio‑technical field. Preferred Qualifications Experience designing evaluation frameworks for multilingual or cross‑cultural contexts. Familiarity with responsible AI, AI safety, or content moderation policy frameworks. Experience with experimental design methodologies, inter‑rater reliability data analysis and annotation quality assessment methods. Prior experience working with localization, internationalization, or language service teams. Experience with survey design, AI policy development, and/or structured content analysis methodologies. At Apple, base pay is one part of our total compensation package and is determined within a range. This provides the opportunity to progress as you grow and develop within a role. The base pay range for this role is between $171,600 and $302,200, and your base pay will depend on your skills, qualifications, experience, and location. Apple employees also have the opportunity to become an Apple shareholder through participation in Apple’s discretionary employee stock programs. Apple employees are eligible for discretionary restricted stock unit awards, and can purchase Apple stock at a discount if voluntarily participating in Apple’s Employee Stock Purchase Plan. You’ll also receive benefits including: Comprehensive medical and dental coverage, retirement benefits, a range of discounted products and free services, and for formal education related to advancing your career at Apple, reimbursement for certain educational expenses — including tuition. Additionally, this role might be eligible for discretionary bonuses or commission payments as well as relocation. Learn more about Apple Benefits. Note: Apple benefit, compensation and employee stock programs are subject to eligibility requirements and other terms of the applicable plan or program. Apple is an equal opportunity employer that is committed to inclusion and diversity. We seek to promote equal opportunity for all applicants without regard to race, color, religion, sex, sexual orientation, gender identity, national origin, disability, Veteran status, or other legally protected characteristics. Learn more about your EEO rights as an applicant. Apple accepts applications to this posting on an ongoing basis. #J-18808-Ljbffr Apple Inc.

Vacancy posted 2 days ago
Similar jobs that could be interesting for youBased on the Machine Learning Safety: Evaluation Research Engineer in Seattle, WA vacancy
  • $315k

    We are looking for Research Engineers to build “gold standard” evaluations for catastrophic risks, in order to understand what AI Safety Level (ASL) to assign to models. Research leads...  ...experiments and iterate quickly to solve machine learning problems Thrive in collaborative... 
    Suggested
    Currently hiring
    Work at office
    Immediate start
    Home office
    Visa sponsorship
    Relocation package

    Anthropic

    Seattle, WA
    1 day ago
  • $159k - $231k

    Engineering Analyst, Trust and Safety, Messaging Google Kirkland, WA, USA Benefits Our comprehensive benefits...  ...of new launches and establish, evaluate, and maintain abuse protections....  .... Lead projects to productionize machine learning models, heuristic rules, and measure... 
    Suggested
    Full time
    Temporary work

    Google Inc.

    Kirkland, WA
    3 days ago
  •  ...drive quality improvements through data analysis and ML experiments. This role requires a BS and a significant background in machine learning, along with a collaborative mindset. The position is based in Seattle, Washington, and offers a competitive salary range and excellent... 
    Suggested

    Apple

    Seattle, WA
    3 days ago
  • $315k

    As a Research Engineer or Research Scientist in Applied Finetuning, you will directly train...  ...run experiments on data mixes, design evaluations, and improve our production model training...  ...state-of-the-art research in AI and machine learning, and propose ways to apply these... 
    Suggested
    Work at office
    Home office
    Visa sponsorship
    Relocation package

    Anthropic

    Seattle, WA
    1 day ago
  • $112.5k

     ...Laboratory (PNNL) is a world‑class research institution focused on...  ...into building research and engineering workflows to improve speed,...  ...Develop, fine‑tune, test, evaluate, and apply generative AI...  ...testing, and integrating AI or machine learning workflows into building... 
    Suggested
    Work experience placement

    Pacific Northwest National Laboratory

    Seattle, WA
    2 days ago
  • $89.3k

     ...PNNL) conducts transformative research in energy, environmental,...  ...Responsibilities Research Engineer II - AI for Building Energy...  ...Develop, fine‑tune, test, evaluate, and apply generative AI, large...  ...developing and applying AI or machine learning methods to building energy modeling... 

    Pacific Northwest National Laboratory

    Seattle, WA
    1 day ago
  • $132k - $189k

    Engineering Analyst, Payments Trust and Safety Google Seattle, WA, USA Benefits Health, dental, vision, life, disability insurance Retirement Benefits...  ...R, C/C++), statistical analysis (e.g., Scipy) and machine learning libraries (e.g., Scikit-learn, TF), Large Language... 
    Full time
    Temporary work

    Google Inc.

    Seattle, WA
    2 days ago
  • $139.5k - $258.1k

     ...LLM Machine Learning Research Engineer Apple is seeking a Research Engineer to join our Foundation Model Preparation and Algorithm Team. We are...  ...server optimization, ML tools/platforms, datasets, and evaluation. You will develop reliable and scalable pipelines and algorithms... 
    Relocation

    Apple

    Seattle, WA
    5 days ago
  • $139.5k - $258.1k

     ...Machine Learning Research Engineer, Text Generation, Input Experience Apple is where individual imaginations...  ...context-augmented text rewriting, safety-controlled text composition, free-...  ...parameter tuning in model training and evaluation, and reproducing research... 
    Relocation

    Apple

    Seattle, WA
    5 days ago
  • $171.6k - $258.1k

     ...Human Factors Research Engineer, AIML Data Operations Apple is where individual imaginations...  ...high-quality, human-annotated, machine learning data at scale for product teams across...  ...human factors research, especially in evaluating user interface designs ~ Expert in... 
    Relocation

    Apple

    Seattle, WA
    3 days ago
  • $220k - $325k

     ...Machine Learning Research Scientist / Engineer, Reasoning About Scale At Scale AI, our mission is to accelerate the development of AI applications...  ...Intelligence (AGI). Building on our history of model evaluation with enterprise and government customers, we are expanding... 
    Full time

    SCALE

    Seattle, WA
    2 days ago
  • $139.5k - $258.1k

     ...LLM Machine Learning Research Engineer, Model Optimization & Algorithms Development, SIML The Apple Intelligence Model Optimization and Algorithms...  ...experience Experimental rigor when training/evaluating DNNs for the purpose of benchmarking neural network optimization... 
    Relocation

    Apple

    Seattle, WA
    7 days ago
  • $171.6k - $302.2k

     ...Description In this role, you will develop, evaluate, and continuously improve Generative AI...  ...apply deep expertise in health-focused machine learning alongside rigorous evaluation...  ...fast-growing team of top scientists and engineers to build generative systems that meet the... 
    Relocation

    Apple Inc.

    Seattle, WA
    3 days ago
  • $189.6k - $237k

     ...platform has been powering MLEs, researchers, data scientists and...  ...and automatic training and evaluation of LLM's, as well as evaluation...  ...systems Strong software engineering skills, proficient in frameworks...  ..., retirement benefits, a learning and development stipend, and... 
    Full time

    Scale AI

    Seattle, WA
    7 hours ago
  • $350k

     ...growing group of committed researchers, engineers, policy experts, and...  ..., as we are continually evaluating for top talent to join our...  ...love to pair!) * Want to learn more about machine learning research * Care...  ...perspectives on our team. Your safety matters to us. To protect... 
    Full time
    Work at office
    Visa sponsorship
    Flexible hours

    Anthropic

    Seattle, WA
    1 day ago
  •  ...technology company is seeking a skilled engineer to improve search and discovery on Apple devices. You will engage in complex research, conduct experiments, and collaborate...  ...candidates have a strong background in machine learning and relevant industry experience. The position... 

    Apple

    Seattle, WA
    2 days ago
  • $132k - $189k

    Engineering Analyst, Google Ads, Trust and Safety Google Seattle, WA, USA Benefits Health, dental...  ...or web security research. 2 years of experience...  ...will build, leverage and evaluate models and operations to...  ...bonus, equity, or benefits. Learn more about benefits at Google... 
    Full time
    Temporary work

    Google Inc.

    Seattle, WA
    3 days ago
  • $350k

     ...growing group of committed researchers, engineers, policy experts, and business...  ..., alignment, and safety. As a Research Engineer on...  ...for model fine‑tuning and evaluation Develop tools to measure and...  ...proficiency in Python, deep learning frameworks, and distributed... 
    Work at office
    Visa sponsorship
    Flexible hours

    Menlo Ventures

    Seattle, WA
    3 days ago
  •  ...organization builds the shared machine learning and AI infrastructure that...  ...to production serving, evaluation, and monitoring, enabling reliable...  ...Overview As a Senior Research Scientist, you will lead applied...  ...infrastructure, feature engineering workflows, and monitoring... 
    Work experience placement
    Local area

    Plaid

    Seattle, WA
    2 days ago
  • $139.5k - $258.1k

     ...Machine Learning Research Engineer (Human Sensing), SIML - ISE The System Intelligence Machine Learning (SIML) organization is looking for Research Engineers with a strong foundation in Machine Learning and Computer Vision to develop the next generation of multi-modal... 
    Relocation

    Apple

    Seattle, WA
    6 days ago
  •  ...model pretraining to production serving, evaluation, and monitoring. Responsibilities:...  ...architecture to production serving. Doing research that ships: driving decisions from experimentation...  ...work amplifying the capabilities of engineers and product teams across Plaid. Helping... 
    Work experience placement
    Local area
    Immediate start

    Plaid

    Seattle, WA
    2 days ago
  •  ...Huawei Research America Cloud AI Team Position Huawei is a leading global information...  ...and qualified Computer Vision Research Engineers to conduct applied research and industrial practice in Computer Vision and Machine Learning. We are interested in all components of... 
    Worldwide

    Netpace

    Bellevue, WA
    7 hours ago
  • $197.3k - $313.7k

     ...efforts. Job Category Software Engineering Job Details About Salesforce...  ...directly enable world-class research and products used by millions...  ...spawn entirely new applications. Learn more about our work at...  ...services, like APIs, UIs, agentic evaluators, and more. Roll out scalable... 
    Full time

    Salesforce

    Seattle, WA
    1 day ago
  • $146.88k - $220.32k

     ...Who You Are: To thrive as a Research Engineer at Ai2, you'll bring a blend...  ...experience with deep learning and/or foundation models — whether...  ...preprocessing Designing, training, and evaluating multimodal models (vision +...  ...risk to the health or safety of themselves or others. The... 
    Full time
    Contract work
    Work at office
    Flexible hours
    Weekend work

    The Allen Institute for Artificial Intelligence

    Seattle, WA
    5 days ago
  • $176k - $255k

     ...accelerate progress in GenAI research. We are looking for Research Scientists and Research Engineers with expertise in LLM post-...  ...degree in Computer Science, Machine Learning, AI, or a related field....  ...ensure a fair and thorough evaluation of all applicants. About... 
    Full time
    Shift work

    Scale AI

    Seattle, WA
    more than 2 months ago
  • $124k - $280k

     ...PwC, our people in data and analytics engineering focus on leveraging advanced technologies...  ...Those in artificial intelligence and machine learning at PwC will focus on developing and...  ...collaborating closely with team members. We evaluate these factors thoughtfully to... 
    Full time
    H1b

    PwC

    Seattle, WA
    4 days ago
  • $140k - $150k

     ...AI practitioners and engineers. We harness the power...  ...**About Job****AI** **Research Engineer: Vision AI /...  ...known for advancing deep learning in vision tasks. This...  ...Reasoning with VLMs:** Train/evaluate vision-language models...  ..., navigation, or safety workflows.* **Data &... 
    Full time
    Remote work

    Centific Global Solutions, Inc.

    Seattle, WA
    1 day ago
  •  ...AI/ML Engineer In Wholesale Payments Operations JPMorganChase runs the world's largest...  .... This role is in Applied AI and Machine Learning, partnering closely with Wholesale Payments...  ...that enable repeatable training, evaluation, and continuous improvement of models... 

    Chase

    Seattle, WA
    4 days ago
  • $80.24k

    ## Research Scientist/Engineer 3Applylocations: Seattle, WAtime type: Full timeposted on: Posted Todaytime...  ...Support of LSST Operations (25%)**:* Evaluate the scientific data quality of LSST...  ...applicants are encouraged to learn more about DiRAC research areas at Benefits... 
    Full time
    Temporary work
    H1b
    Work at office
    Shift work
    Night shift

    FHLB Des Moines

    Seattle, WA
    3 days ago
  • $216.6k - $325.5k

    Sr. Research Manager, Evaluation Science Seattle, Washington, United States Software and Services AI...  ...psychometric rigor, and production engineering. What unites the strongest...  ...Qualifications Ph.D. in Computer Science, Machine Learning, Statistics, or a closely related field... 
    Immediate start
    Relocation

    Apple Inc.

    Seattle, WA
    2 days ago

Do you want to receive more vacancies?

Subscribe and receive similar vacancies to Machine Learning Safety: Evaluation Research Engineer. Be the first to apply!