Machine Learning Safety: Evaluation Research Engineer
$171.6k - $302.2kApple Inc.
Machine Learning Safety: Evaluation Research Engineer Seattle, Washington, United States Machine Learning and AI This role supports the design and development of safety evaluation methodologies for generative and agentic AI features that enable users across the globe to interact with our media products and services. Description You will play an impactful role: shaping responsible AI and safety policies, evaluating fidelity to product safety requirements, creating risk assessments and taxonomies, curating exemplar safety evaluation datasets, and ensuring that evaluation frameworks are culturally and linguistically grounded. An ideal candidate possesses a strong understanding of issues in responsible AI and A and society, technology evaluation design principles and practices, and brings experience designing evaluations to support policies and/or product requirements, classification systems, and annotation and/or study participant guidelines. Responsibilities Taxonomy Development: Design, refine, and maintain safety-relevant taxonomies that capture risk categories, content types, and policy distinctions, achieved through collaborations with subject matter experts who bring knowledge across languages and cultural contexts. You will work collaboratively to ensure taxonomies are comprehensive, internally consistent, and actionable for downstream evaluation work. Policy-to-Data Translation: Develop and validate exemplar sets that illustrate taxonomy categories, edge cases, and boundary conditions. Collaborate with language and cultural experts to ensure exemplars are culturally appropriate and representative across target markets. Partner with policy, product, and engineering teams to translate responsible AI policies and guidelines into concrete data requirements, annotation schemas, and evaluation criteria that can be operationalized across markets. Develop and maintain synthetic data generation pipelines to augment evaluation coverage, stress-test safety boundaries, and support evaluation in low-resource languages. Ensure synthetic data is diverse, representative, and validated against human-generated benchmarks. Automated Judge Development: Shape the development, training and fine‑tuning, and validation of automated judge models that can reliably score AI system outputs for safety and policy compliance. Develop calibration and agreement metrics to ensure judges meet human‑parity benchmarks. Design and implement validation frameworks to assess the accuracy, reliability, and consistency of automated evaluation systems. Develop methods to detect drift, bias, and failure modes in automated judges across markets. Scalable Analysis & Reporting Automation: Create automated pipelines for analysis and reporting that reduce manual effort, increase reproducibility, and enable rapid cross‑market safety assessments. Build tooling that integrates with existing dashboards and reporting workflows. Documentation & Communication: Produce clear, detailed documentation artifacts. Present findings and recommendations to cross‑functional stakeholders including engineering, product, compliance, and policy teams. Canonical Guideline Development: Author and maintain canonical evaluation guidelines that standardize task definitions, rating criteria, and edge‑case handling. These assets will be adapted to scale across languages and markets, with the support of multi‑lingual and operations experts. You will ensure guidelines are clear, complete, and adaptable. Evaluation Design & Execution: Pilot and run evaluations with validated task setups, manage evaluation instruments and surface issues before full‑scale deployment. Analyze pilot results and iterate on guidelines and configurations accordingly. esign and run pilot evaluations to validate task setups, identify guideline ambiguities, calibrate annotator understanding, and surface issues before full‑scale deployment. Analyze pilot results and iterate on guidelines and configurations accordingly. Monitoring & Data Quality: Develop and implement monitoring frameworks to track evaluation progress, annotator performance, inter‑rater agreement, and data quality in real time. Flag anomalies and implement corrective actions to maintain data integrity across markets. Minimum Qualifications 4+ years of experience in an applied research setting related to evaluation design, AI ethics, Responsible AI, AI safety, computational social science, content analysis, or a closely related field. Strong understanding of taxonomy design, classification systems, and annotation methodology. Experience developing evaluation guidelines and exemplar sets for human annotation or labeling tasks. Demonstrated ability to collaborate with subject matter experts (e.g., linguists, cultural consultants, multi‑lingual annotators) to inform research design. Able to work independently to drive outcomes among cross‑functional teams, with minimal direction. Organized, highly attentive to detail, and manages time well. Excellent written and oral communication skills. Experience working in industry. Advanced degree (MS/PhD) in Linguistics, Information Science, Computational Social Science, or a related socio‑technical field. Preferred Qualifications Experience designing evaluation frameworks for multilingual or cross‑cultural contexts. Familiarity with responsible AI, AI safety, or content moderation policy frameworks. Experience with experimental design methodologies, inter‑rater reliability data analysis and annotation quality assessment methods. Prior experience working with localization, internationalization, or language service teams. Experience with survey design, AI policy development, and/or structured content analysis methodologies. At Apple, base pay is one part of our total compensation package and is determined within a range. This provides the opportunity to progress as you grow and develop within a role. The base pay range for this role is between $171,600 and $302,200, and your base pay will depend on your skills, qualifications, experience, and location. Apple employees also have the opportunity to become an Apple shareholder through participation in Apple’s discretionary employee stock programs. Apple employees are eligible for discretionary restricted stock unit awards, and can purchase Apple stock at a discount if voluntarily participating in Apple’s Employee Stock Purchase Plan. You’ll also receive benefits including: Comprehensive medical and dental coverage, retirement benefits, a range of discounted products and free services, and for formal education related to advancing your career at Apple, reimbursement for certain educational expenses — including tuition. Additionally, this role might be eligible for discretionary bonuses or commission payments as well as relocation. Learn more about Apple Benefits. Note: Apple benefit, compensation and employee stock programs are subject to eligibility requirements and other terms of the applicable plan or program. Apple is an equal opportunity employer that is committed to inclusion and diversity. We seek to promote equal opportunity for all applicants without regard to race, color, religion, sex, sexual orientation, gender identity, national origin, disability, Veteran status, or other legally protected characteristics. Learn more about your EEO rights as an applicant. Apple accepts applications to this posting on an ongoing basis. #J-18808-Ljbffr Apple Inc.
$315k
We are looking for Research Engineers to build “gold standard” evaluations for catastrophic risks, in order to understand what AI Safety Level (ASL) to assign to models. Research leads... ...experiments and iterate quickly to solve machine learning problems Thrive in collaborative...SuggestedCurrently hiringWork at officeImmediate startHome officeVisa sponsorshipRelocation package$159k - $231k
Engineering Analyst, Trust and Safety, Messaging Google Kirkland, WA, USA Benefits Our comprehensive benefits... ...of new launches and establish, evaluate, and maintain abuse protections.... .... Lead projects to productionize machine learning models, heuristic rules, and measure...SuggestedFull timeTemporary work- ...drive quality improvements through data analysis and ML experiments. This role requires a BS and a significant background in machine learning, along with a collaborative mindset. The position is based in Seattle, Washington, and offers a competitive salary range and excellent...Suggested
$315k
As a Research Engineer or Research Scientist in Applied Finetuning, you will directly train... ...run experiments on data mixes, design evaluations, and improve our production model training... ...state-of-the-art research in AI and machine learning, and propose ways to apply these...SuggestedWork at officeHome officeVisa sponsorshipRelocation package$112.5k
...Laboratory (PNNL) is a world‑class research institution focused on... ...into building research and engineering workflows to improve speed,... ...Develop, fine‑tune, test, evaluate, and apply generative AI... ...testing, and integrating AI or machine learning workflows into building...SuggestedWork experience placement$89.3k
...PNNL) conducts transformative research in energy, environmental,... ...Responsibilities Research Engineer II - AI for Building Energy... ...Develop, fine‑tune, test, evaluate, and apply generative AI, large... ...developing and applying AI or machine learning methods to building energy modeling...$132k - $189k
Engineering Analyst, Payments Trust and Safety Google Seattle, WA, USA Benefits Health, dental, vision, life, disability insurance Retirement Benefits... ...R, C/C++), statistical analysis (e.g., Scipy) and machine learning libraries (e.g., Scikit-learn, TF), Large Language...Full timeTemporary work$139.5k - $258.1k
...LLM Machine Learning Research Engineer Apple is seeking a Research Engineer to join our Foundation Model Preparation and Algorithm Team. We are... ...server optimization, ML tools/platforms, datasets, and evaluation. You will develop reliable and scalable pipelines and algorithms...Relocation$139.5k - $258.1k
...Machine Learning Research Engineer, Text Generation, Input Experience Apple is where individual imaginations... ...context-augmented text rewriting, safety-controlled text composition, free-... ...parameter tuning in model training and evaluation, and reproducing research...Relocation$171.6k - $258.1k
...Human Factors Research Engineer, AIML Data Operations Apple is where individual imaginations... ...high-quality, human-annotated, machine learning data at scale for product teams across... ...human factors research, especially in evaluating user interface designs ~ Expert in...Relocation$220k - $325k
...Machine Learning Research Scientist / Engineer, Reasoning About Scale At Scale AI, our mission is to accelerate the development of AI applications... ...Intelligence (AGI). Building on our history of model evaluation with enterprise and government customers, we are expanding...Full time$139.5k - $258.1k
...LLM Machine Learning Research Engineer, Model Optimization & Algorithms Development, SIML The Apple Intelligence Model Optimization and Algorithms... ...experience Experimental rigor when training/evaluating DNNs for the purpose of benchmarking neural network optimization...Relocation$171.6k - $302.2k
...Description In this role, you will develop, evaluate, and continuously improve Generative AI... ...apply deep expertise in health-focused machine learning alongside rigorous evaluation... ...fast-growing team of top scientists and engineers to build generative systems that meet the...Relocation$189.6k - $237k
...platform has been powering MLEs, researchers, data scientists and... ...and automatic training and evaluation of LLM's, as well as evaluation... ...systems Strong software engineering skills, proficient in frameworks... ..., retirement benefits, a learning and development stipend, and...Full time$350k
...growing group of committed researchers, engineers, policy experts, and... ..., as we are continually evaluating for top talent to join our... ...love to pair!) * Want to learn more about machine learning research * Care... ...perspectives on our team. Your safety matters to us. To protect...Full timeWork at officeVisa sponsorshipFlexible hours- ...technology company is seeking a skilled engineer to improve search and discovery on Apple devices. You will engage in complex research, conduct experiments, and collaborate... ...candidates have a strong background in machine learning and relevant industry experience. The position...
$132k - $189k
Engineering Analyst, Google Ads, Trust and Safety Google Seattle, WA, USA Benefits Health, dental... ...or web security research. 2 years of experience... ...will build, leverage and evaluate models and operations to... ...bonus, equity, or benefits. Learn more about benefits at Google...Full timeTemporary work$350k
...growing group of committed researchers, engineers, policy experts, and business... ..., alignment, and safety. As a Research Engineer on... ...for model fine‑tuning and evaluation Develop tools to measure and... ...proficiency in Python, deep learning frameworks, and distributed...Work at officeVisa sponsorshipFlexible hours- ...organization builds the shared machine learning and AI infrastructure that... ...to production serving, evaluation, and monitoring, enabling reliable... ...Overview As a Senior Research Scientist, you will lead applied... ...infrastructure, feature engineering workflows, and monitoring...Work experience placementLocal area
$139.5k - $258.1k
...Machine Learning Research Engineer (Human Sensing), SIML - ISE The System Intelligence Machine Learning (SIML) organization is looking for Research Engineers with a strong foundation in Machine Learning and Computer Vision to develop the next generation of multi-modal...Relocation- ...model pretraining to production serving, evaluation, and monitoring. Responsibilities:... ...architecture to production serving. Doing research that ships: driving decisions from experimentation... ...work amplifying the capabilities of engineers and product teams across Plaid. Helping...Work experience placementLocal areaImmediate start
- ...Huawei Research America Cloud AI Team Position Huawei is a leading global information... ...and qualified Computer Vision Research Engineers to conduct applied research and industrial practice in Computer Vision and Machine Learning. We are interested in all components of...Worldwide
$197.3k - $313.7k
...efforts. Job Category Software Engineering Job Details About Salesforce... ...directly enable world-class research and products used by millions... ...spawn entirely new applications. Learn more about our work at... ...services, like APIs, UIs, agentic evaluators, and more. Roll out scalable...Full time$146.88k - $220.32k
...Who You Are: To thrive as a Research Engineer at Ai2, you'll bring a blend... ...experience with deep learning and/or foundation models — whether... ...preprocessing Designing, training, and evaluating multimodal models (vision +... ...risk to the health or safety of themselves or others. The...Full timeContract workWork at officeFlexible hoursWeekend work$176k - $255k
...accelerate progress in GenAI research. We are looking for Research Scientists and Research Engineers with expertise in LLM post-... ...degree in Computer Science, Machine Learning, AI, or a related field.... ...ensure a fair and thorough evaluation of all applicants. About...Full timeShift work$124k - $280k
...PwC, our people in data and analytics engineering focus on leveraging advanced technologies... ...Those in artificial intelligence and machine learning at PwC will focus on developing and... ...collaborating closely with team members. We evaluate these factors thoughtfully to...Full timeH1b$140k - $150k
...AI practitioners and engineers. We harness the power... ...**About Job****AI** **Research Engineer: Vision AI /... ...known for advancing deep learning in vision tasks. This... ...Reasoning with VLMs:** Train/evaluate vision-language models... ..., navigation, or safety workflows.* **Data &...Full timeRemote work- ...AI/ML Engineer In Wholesale Payments Operations JPMorganChase runs the world's largest... .... This role is in Applied AI and Machine Learning, partnering closely with Wholesale Payments... ...that enable repeatable training, evaluation, and continuous improvement of models...
$80.24k
## Research Scientist/Engineer 3Applylocations: Seattle, WAtime type: Full timeposted on: Posted Todaytime... ...Support of LSST Operations (25%)**:* Evaluate the scientific data quality of LSST... ...applicants are encouraged to learn more about DiRAC research areas at Benefits...Full timeTemporary workH1bWork at officeShift workNight shift$216.6k - $325.5k
Sr. Research Manager, Evaluation Science Seattle, Washington, United States Software and Services AI... ...psychometric rigor, and production engineering. What unites the strongest... ...Qualifications Ph.D. in Computer Science, Machine Learning, Statistics, or a closely related field...Immediate startRelocation
Do you want to receive more vacancies?
Subscribe and receive similar vacancies to Machine Learning Safety: Evaluation Research Engineer. Be the first to apply!
- deep learning research engineer Seattle, WA
- research programmer Seattle, WA
- research engineer Seattle, WA
- junior machine learning research engineer Seattle, WA
- machine learning intern Seattle, WA
- machine learning researcher Seattle, WA
- machine learning part time Seattle, WA
- machine learning Seattle, WA
- artificial intelligence - machine learning intern Seattle, WA
- machine learning research scientist Seattle, WA

