ML Engineer - Automated Evaluation and Adversarial Design
$139.5k - $258.1kApple Inc.
ML Engineer - Automated Evaluation and Adversarial Design Seattle, Washington, United States Software and Services The Productivity and Machine Learning Evaluation team ensures the quality of AI-powered features across a suite of productivity and creative applications; including Creator Studio, used by hundreds of millions of people. This team serves as the primary evaluation function, providing critical quality signals that directly influence model development decisions and product launches. This role focuses on building and scaling automated evaluation systems and designing adversarial and stress-testing methodologies across multiple AI features. The work requires a deep understanding of how AI systems fail and how to measure quality rigorously. As features evolve from single-turn interactions into multi-turn, agentic experiences, the evaluation challenge shifts from assessing individual outputs to stress-testing entire conversation flows and agent decision chains. This is an opportunity to shape the evaluation infrastructure that determines whether AI features meet the bar for hundreds of millions of users. Description Day-to-day work involves designing, building, and maintaining automated evaluation systems that assess AI feature quality at scale, including multi-turn conversation evaluation and end-to-end agent workflow testing. This includes creating adversarial test suites that probe model weaknesses and running stress tests to ensure features perform under demanding conditions, with particular focus on failure modes that only emerge across extended interactions, such as: context degradation, goal drift, and compounding errors. Typical deliverables include: evaluation frameworks and rubrics, quality assessment reports, adversarial test case libraries, multi-turn stress-test pipelines, and recommendations on model readiness. Responsibilities Define and own the automated evaluation approach for AI features, translating qualitative notions of quality into measurable, reproducible assessments across both single-turn and multi-turn agentic experiences Build adversarial test suites that target known and emerging model failure modes, including edge cases relevant to productivity application workflows including conversation-level failures such as context loss, instruction forgetting, and cascading errors across multi-step tasks Develop and execute stress test protocols that validate minimum performance thresholds under atypical input conditions including extended conversation lengths, adversarial mid-conversation topic shifts, and complex tool-use sequences Ensure alignment between automated and human evaluation methods on an ongoing basis, identifying and resolving systematic disagreements Collaborate with engineering partners to integrate evaluation into development and release workflows Scale adversarial test case generation and stress test execution, leveraging automation where appropriate, including programmatic generation of multi-turn conversation scenarios and agent interaction traces Influence model and feature quality decisions by communicating evaluation findings and readiness assessments to cross-functional partners Minimum Qualifications Bachelor’s degree in Computer Science, Machine Learning, Statistics, or a related field 4+ years of experience building or significantly extending ML evaluation systems, including designing evaluation benchmarks or quality assessment frameworks including evaluation of sequential or multi-step AI outputs Experience independently defining evaluation architecture and methodology for AI or ML systems with the ability to design evaluation approaches where the unit of analysis is a conversation or session rather than a single output Experience designing adversarial or red-teaming test methodologies for ML models or AI-powered features including adversarial scenarios that target failures across multi-turn interactions Experience with Python and ML frameworks (PyTorch, TensorFlow, or equivalent) in production or near-production settings Track record of owning technical direction for evaluation efforts across multiple features or product areas Preferred Qualifications Experience evaluating user-facing AI features in consumer applications, with an understanding of how technical metrics connect to user-perceived quality Familiarity with productivity software or creative tools, with the ability to assess output quality from a user workflow perspective Experience ensuring alignment between automated and human evaluation methods, including inter-annotator agreement analysis and bias detection Track record of designing evaluation systems that scale across multiple features or product areas without requiring bespoke solutions for each Experience evaluating different types of AI systems, including API-based and custom-trained models Demonstrated ability to communicate evaluation findings and readiness assessments to cross-functional partners Experience leveraging automation to scale evaluation data generation and analysis Experience building evaluation pipelines for conversational AI, dialogue systems, or agentic workflows, including turn-level and session-level automated scoring Familiarity with agent orchestration frameworks (LangChain, LangGraph, CrewAI, AutoGen) and observability tooling (LangSmith, Braintrust, Arize), with an understanding of how to instrument and evaluate multi-step agent runs Experience designing adversarial tests for tool-use reliability, function-calling accuracy, or agent planning quality At Apple, base pay is one part of our total compensation package and is determined within a range. This provides the opportunity to progress as you grow and develop within a role. The base pay range for this role is between $139,500 and $258,100, and your base pay will depend on your skills, qualifications, experience, and location. Apple employees also have the opportunity to become an Apple shareholder through participation in Apple’s discretionary employee stock programs. Apple employees are eligible for discretionary restricted stock unit awards, and can purchase Apple stock at a discount if voluntarily participating in Apple’s Employee Stock Purchase Plan. You’ll also receive benefits including: Comprehensive medical and dental coverage, retirement benefits, a range of discounted products and free services, and for formal education related to advancing your career at Apple, reimbursement for certain educational expenses — including tuition. Additionally, this role might be eligible for discretionary bonuses or commission payments as well as relocation. Learn more about Apple Benefits. Note: Apple benefit, compensation and employee stock programs are subject to eligibility requirements and other terms of the applicable plan or program. Apple is an equal opportunity employer that is committed to inclusion and diversity. We seek to promote equal opportunity for all applicants without regard to race, color, religion, sex, sexual orientation, gender identity, national origin, disability, Veteran status, or other legally protected characteristics. Learn more about your EEO rights as an applicant. Apple accepts applications to this posting on an ongoing basis. #J-18808-Ljbffr Apple Inc.
- ...Productivity and Machine Learning Evaluation team ensures the quality of AI-... ...focuses on building and scaling automated evaluation systems and designing adversarial and stress-testing methodologies... ...building or significantly extending ML evaluation systems, including designing...SuggestedShift work
$139.5k - $258.1k
Apple Inc. is seeking an ML Engineer for its Seattle location to build and scale automated evaluation systems for AI features. The ideal candidate will have a Bachelor... ...ML evaluation. Responsibilities include designing adversarial test suites and developing stress test...Suggested$60 - $70 per hour
...We are seeking a Machine Learning Engineer to join a high-impact team focused on advancing LLM evaluation, NLP, and AI-driven automation. This role centers on designing scalable evaluation frameworks,... ...Qualifications: ~5+ years of experience in ML engineering, NLP, or AI/ML...SuggestedContract workTemporary workRemote work3 days per week$139.5k - $258.1k
...Evaluation & Insights Machine Learning Engineer Imagine what you could do here.... ...evaluation frameworks, design qualitative, and... ..., factuality, adversarial robustness, safety... ...Recognition: Apply advanced ML techniques (e.g.,... .... MLOps & Automation: Develop robust...SuggestedRelocation- ...Productivity and Machine Learning Evaluation team ensures the quality of... ...behavior. The work involves designing feature-level quality metrics... ...metrics for AI-powered or ML-driven features in consumer-facing... ...Experience partnering with engineering or data teams to define data...Suggested
$139.5k - $258.1k
...AIML - Machine Learning Engineer - Computer Vision &... ...Machine Intelligence, Neural Design (MIND) team, part of... ...inference. With roots in ML, computer vision, and... ...Engineer to drive the data & evaluation lifecycle for our... ...data observability and automated data validation checks...Relocation$139.5k - $258.1k
...Machine Learning Engineer, Video Search Team The Apple... ...Engineering AI/ML organization is hiring... ...partners in product and design, you'll translate cutting... ...enhancement. Develop automated tests for continuous integration... ...big data tech to evaluate content discovery features...Relocation- ...We are looking for an adversarial machine learning specialist... ...and tooling to automate attack scenarios Analyse... ...Collaborate with engineering teams to validate remediation... ...in adversarial ML or AI security research... ...run test cases — you design new ones. Benefits...
$171.6k - $230.1k
...Staff GenAI Engineer The Emerging Tech & Automation team is seeking a forward-thinking and... ...transformation. This includes designing and implementing systems... ...opportunities. Evaluate and integrate LLMs and modern... ...leveraging modern GenAI/ML frameworks. Work closely...Permanent employmentWork experience placement- ...Seattle, WA, we are a team of engineers and technologists from... ...a Senior Software Engineer Evaluation, you will design and implement systems that... ...deficiencies and collaborate with ML engineers to improve... ...evaluation pipelines in Python for automated benchmarking and regression...
$120.3k - $210.1k
...Applied ML Engineer – AI/ML Evaluation & Simulation We're building the next generation of AI evaluation systems — and we're looking for a motivated early-career engineer who's excited to work at the intersection of ML, software, and product. You'll join a team focused...InternshipRelocation$184.5k
...everyone, everywhere. We design cutting-edge tech to... ...Senior Machine Learning Engineer Expedia Technology teams... ...batch and real-time ML systems that power pricing... ...model versioning, and automated retraining Strong ownership... ...engineering, model evaluation, bias/variance tradeoffs...Local area$139.5k - $258.1k
Apple Inc. in Seattle, Washington, seeks an ML Engineer for the Productivity and Machine Learning Evaluation team. This role involves defining quality metrics and analyzing evaluation results to inform decisions on AI features across productivity applications. Candidates...- ...We're building the next generation of AI evaluation systems — and we're looking for a hands-on engineer who can bridge ML, software, and product to make AI systems more... ...learning. In this hands-on role, you’ll help design and build intelligent systems that simulate complex...
- A leading technology company in Seattle is seeking an early-career Applied ML Engineer to develop AI evaluation systems. This role involves collaboration with engineers and researchers, focusing on ML, software, and product integration. Candidates should have a degree...
- ...Seattle is seeking a Senior or Staff-level Applied ML Engineer to bridge ML, software, and product development. In this role, you will design systems that simulate complex interactions and develop tools for AI evaluation. The ideal candidate has at least 8 years of...
$147.3k - $193.3k
...fabrics and functional design, we create... ...Data & Analytics, Engineering, Legal, Privacy,... ...mechanisms, and security automation, while applying... ...complex systems, evaluating attacker... ...platforms supporting AI/ML development and... ...sophisticated adversarial behaviour; experience...Permanent employmentPart timeWork at officeWork visa$171.6k - $302.2k
...Description As a Senior/Staff Engineer on the Foundation Model Compute... ...team, you will lead the design and development of scheduling... ...orchestration systems for distributed ML workloads running on... ...operational scalability through automation of provisioning, resource management...Relocation$176.76k - $232k
...bar in technical fabrics and functional design, we create transformational products and... ...responsibilities As a Senior AI/ML Engineer, you will lead the delivery of scalable... ...engineering best practices, implement rigorous evaluation frameworks, and design MLOps and...Permanent employmentContract workPart timeWork visa$150k - $250k
...Machine Learning Engineer Job Description... ...through large scale automation, best-in-class... ...applies advanced ML and GenAI to reduce... ...agentic AI systems: Design and implement tool... ...Productionize LLMs: Build evaluation framework for open... ...validator models, adversarial prompts, and...Worldwide- ...Machine Learning Engineer (Senior) About AZX Our... ...specialize in physics-informed ML and enterprise AI... ...project teams Design and write clean, scalable... ...tuning, LangChain, model evaluation tooling) ~ MLOps and infrastructure automation (e.g., CI/CD for ML, Docker...Full timeRemote workWork visaFlexible hoursShift work
- ...human. With this vision, we're designing a new kind of computer, focused... ...Responsibilities: Own evaluation pipelines — design, build, and automate offline and live evals that keep... ...level PyTorch. Proven software engineer who loves ML; comfortable writing production...Full timeContract workFlexible hoursShift work
$106.9k - $160.4k
...ML Engineer At Weyerhaeuser, we sustainably manage forests and manufacture... ...a skilled ML Engineer to design, build, and operationalize... ...to improve reliability, automation, scalability, and developer... ...model development, training, evaluation, and operationalizing models...Full timeTemporary work- ...Model Development: Build, train, and evaluate machine learning models using AWS services... ...Glue, Redshift, Textract and other data engineering tools to preprocess, transform, and manage... ...machine learning pipelines on AWS to automate and operationalize the deployment of models...
$139.5k - $258.1k
...AIML - Machine Learning Engineer, MIND As a Machine Learning Engineer... ...Machine Intelligence Neural Design (MIND) team, you will have an opportunity to be part of an ML innovation organization within... .... Able to define metrics, evaluate ML models, and perform error analysis...Temporary workRelocation$204k - $259k
...Senior Machine Learning Engineer – VLM/LLM Evaluation Waymo is an autonomous driving technology company with the mission to be the world's most... ...study, or equivalent practical experience Experience in ML engineering and applied Deep Learning Experience with large...Full timeTemporary workRemote work$25 - $45 per hour
...Intern - ML Engineering Seattle Intern - Machine Learning Engineering Scowtt is an... ...What You'll Do Assist in training, evaluating, and deploying ML models Work with structured... ...in marketing and sales systems Help automate and maintain data and ML pipelines...Hourly payInternshipImmediate startRemote work$183.6k
...position Responsibilities: Design, implement, and optimize GenAI... ...integrate research findings into scalable engineering solutions that align with business objectives... ...practices that ensure every candidate is evaluated based on skills, experience, and potential...Work at officeRemote work1 day per week$171.6k - $258.1k
...Machine Learning Platform Engineer, AI Evaluation Platform (All Levels) Join... ...upholding the code quality, automation, and testing rigor required... ...Responsibilities System Design & Implementation: Design, code... ...seamlessly with existing ML infrastructure and developer...Relocation- ...Principal AI Agent / ML Software Engineer The Senior... ...deeply hands-on in design, code, reviews, operations... ...memory, retrieval, evaluation, guardrails, and cloud... ...strategy, deployment automation, incident analysis,... ...experiments, golden tasks, adversarial testing, regression...
Do you want to receive more vacancies?
Subscribe and receive similar vacancies to ML Engineer - Automated Evaluation and Adversarial Design. Be the first to apply!
- machine learning ai engineer Seattle, WA
- machine learning engineer Seattle, WA
- junior machine learning research engineer Seattle, WA
- machine learning software engineer Seattle, WA
- ai ml engineer Seattle, WA
- senior ml engineer Seattle, WA
- graduate machine learning engineer Seattle, WA
- computer vision machine learning engineer Seattle, WA
- data scientist machine learning engineer Seattle, WA
- machine learning research scientist Seattle, WA

