ML Engineer - Automated Evaluation and Adversarial Design
$139.5k - $258.1kApple Inc.
ML Engineer - Automated Evaluation and Adversarial Design Seattle, Washington, United States Software and Services The Productivity and Machine Learning Evaluation team ensures the quality of AI-powered features across a suite of productivity and creative applications; including Creator Studio, used by hundreds of millions of people. This team serves as the primary evaluation function, providing critical quality signals that directly influence model development decisions and product launches. This role focuses on building and scaling automated evaluation systems and designing adversarial and stress-testing methodologies across multiple AI features. The work requires a deep understanding of how AI systems fail and how to measure quality rigorously. As features evolve from single-turn interactions into multi-turn, agentic experiences, the evaluation challenge shifts from assessing individual outputs to stress-testing entire conversation flows and agent decision chains. This is an opportunity to shape the evaluation infrastructure that determines whether AI features meet the bar for hundreds of millions of users. Description Day-to-day work involves designing, building, and maintaining automated evaluation systems that assess AI feature quality at scale, including multi-turn conversation evaluation and end-to-end agent workflow testing. This includes creating adversarial test suites that probe model weaknesses and running stress tests to ensure features perform under demanding conditions, with particular focus on failure modes that only emerge across extended interactions, such as: context degradation, goal drift, and compounding errors. Typical deliverables include: evaluation frameworks and rubrics, quality assessment reports, adversarial test case libraries, multi-turn stress-test pipelines, and recommendations on model readiness. Responsibilities Define and own the automated evaluation approach for AI features, translating qualitative notions of quality into measurable, reproducible assessments across both single-turn and multi-turn agentic experiences Build adversarial test suites that target known and emerging model failure modes, including edge cases relevant to productivity application workflows including conversation-level failures such as context loss, instruction forgetting, and cascading errors across multi-step tasks Develop and execute stress test protocols that validate minimum performance thresholds under atypical input conditions including extended conversation lengths, adversarial mid-conversation topic shifts, and complex tool-use sequences Ensure alignment between automated and human evaluation methods on an ongoing basis, identifying and resolving systematic disagreements Collaborate with engineering partners to integrate evaluation into development and release workflows Scale adversarial test case generation and stress test execution, leveraging automation where appropriate, including programmatic generation of multi-turn conversation scenarios and agent interaction traces Influence model and feature quality decisions by communicating evaluation findings and readiness assessments to cross-functional partners Minimum Qualifications Bachelor’s degree in Computer Science, Machine Learning, Statistics, or a related field 4+ years of experience building or significantly extending ML evaluation systems, including designing evaluation benchmarks or quality assessment frameworks including evaluation of sequential or multi-step AI outputs Experience independently defining evaluation architecture and methodology for AI or ML systems with the ability to design evaluation approaches where the unit of analysis is a conversation or session rather than a single output Experience designing adversarial or red-teaming test methodologies for ML models or AI-powered features including adversarial scenarios that target failures across multi-turn interactions Experience with Python and ML frameworks (PyTorch, TensorFlow, or equivalent) in production or near-production settings Track record of owning technical direction for evaluation efforts across multiple features or product areas Preferred Qualifications Experience evaluating user-facing AI features in consumer applications, with an understanding of how technical metrics connect to user-perceived quality Familiarity with productivity software or creative tools, with the ability to assess output quality from a user workflow perspective Experience ensuring alignment between automated and human evaluation methods, including inter-annotator agreement analysis and bias detection Track record of designing evaluation systems that scale across multiple features or product areas without requiring bespoke solutions for each Experience evaluating different types of AI systems, including API-based and custom-trained models Demonstrated ability to communicate evaluation findings and readiness assessments to cross-functional partners Experience leveraging automation to scale evaluation data generation and analysis Experience building evaluation pipelines for conversational AI, dialogue systems, or agentic workflows, including turn-level and session-level automated scoring Familiarity with agent orchestration frameworks (LangChain, LangGraph, CrewAI, AutoGen) and observability tooling (LangSmith, Braintrust, Arize), with an understanding of how to instrument and evaluate multi-step agent runs Experience designing adversarial tests for tool-use reliability, function-calling accuracy, or agent planning quality At Apple, base pay is one part of our total compensation package and is determined within a range. This provides the opportunity to progress as you grow and develop within a role. The base pay range for this role is between $139,500 and $258,100, and your base pay will depend on your skills, qualifications, experience, and location. Apple employees also have the opportunity to become an Apple shareholder through participation in Apple’s discretionary employee stock programs. Apple employees are eligible for discretionary restricted stock unit awards, and can purchase Apple stock at a discount if voluntarily participating in Apple’s Employee Stock Purchase Plan. You’ll also receive benefits including: Comprehensive medical and dental coverage, retirement benefits, a range of discounted products and free services, and for formal education related to advancing your career at Apple, reimbursement for certain educational expenses — including tuition. Additionally, this role might be eligible for discretionary bonuses or commission payments as well as relocation. Learn more about Apple Benefits. Note: Apple benefit, compensation and employee stock programs are subject to eligibility requirements and other terms of the applicable plan or program. Apple is an equal opportunity employer that is committed to inclusion and diversity. We seek to promote equal opportunity for all applicants without regard to race, color, religion, sex, sexual orientation, gender identity, national origin, disability, Veteran status, or other legally protected characteristics. Learn more about your EEO rights as an applicant. Apple accepts applications to this posting on an ongoing basis. #J-18808-Ljbffr Apple Inc.
$139.5k - $258.1k
Apple Inc. is seeking an ML Engineer for its Seattle location to build and scale automated evaluation systems for AI features. The ideal candidate will have a Bachelor... ...ML evaluation. Responsibilities include designing adversarial test suites and developing stress test...Suggested$139.5k - $258.1k
ML Engineer - Evaluation Analysis, Metric and Data Strategy Seattle, Washington, United States Software and Services The Productivity and Machine... ...signals and real‑world user behavior. The work involves designing feature-level quality metrics, collaborating with partner...SuggestedRelocation$139.5k - $258.1k
AIML - Machine Learning Engineer - Computer Vision &... ...Machine Intelligence, Neural Design (MIND) team, part of... ...inference. With roots in ML, computer vision, and... ...Engineer to drive the data & evaluation lifecycle for our... ...data observability and automated data validation checks...SuggestedRelocation$184.5k
...everyone, everywhere. We design cutting-edge tech to... ...Senior Machine Learning Engineer Expedia Technology teams... ...batch and real-time ML systems that power pricing... ...model versioning, and automated retraining Strong ownership... ...engineering, model evaluation, bias/variance tradeoffs...SuggestedLocal area$184.5k
...everyone, everywhere. We design cutting-edge tech to... ...Machine Learning Engineer role is part of the Distribution... ...model development and evaluation through implementation... ...problems into clear ML‑driven solutions,... ...reducing manual toil through automation (for example, CI/CD...SuggestedLocal areaFlexible hours$139.5k - $258.1k
Apple Inc. in Seattle, Washington, seeks an ML Engineer for the Productivity and Machine Learning Evaluation team. This role involves defining quality metrics and analyzing evaluation results to inform decisions on AI features across productivity applications. Candidates...$153.2k - $234.1k
...mobility. Through a human-centered design process, we create vehicles... ...powers every machine learning engineer working on our cutting-edge... ...machine learning model training and evaluation workflows across GM. Own... ...systems/applications or advanced ML Applications. Proven track...Work at officeLocal areaRemote workWork from homeFlexible hours$189.3k - $320.7k
...mobility. Through a human-centered design process, we create vehicles... ...’ll Do Design and implement ML solutions aligned with GM’s... .... Support and mentor engineers through technical collaboration... ...participate in a company vehicle evaluation program, through which you...Local areaRemote workRelocationRelocation packageFlexible hours$400 per month
...contributors to support a Frontier Code Agents project in Bellevue, Washington. You will evaluate and improve AI coding models through structured assessments, applying your machine learning engineering skills to realistic scenarios. The ideal candidate has 2+ years in the field...Remote job- Alignerr is seeking a Senior Python Infrastructure Engineer to work remotely on critical AI model development tasks. You will design, build, and optimize data pipelines, annotation tools, and evaluation systems essential for next-generation AI models. This contract role...Remote jobContract workFlexible hours
$171.6k - $302.2k
...Description As a Senior/Staff Engineer on the Foundation Model Compute... ...team, you will lead the design and development of scheduling... ...orchestration systems for distributed ML workloads running on... ...operational scalability through automation of provisioning, resource management...Relocation$134.8k - $245.8k
...accomplish! The Apple Services Engineering team is one of the most... ...up solid domain knowledge and automated testing strategies and systems... ...services offerings in the AI/ML space, we would love to talk... ...as features are implemented Design and evolve automation frameworks...Relocation package- Expedia Group is seeking a Senior Machine Learning Engineer for their Seattle office to design and scale robust ML systems. The role involves collaborating across teams to provide high-quality solutions that enhance traveler experiences. Candidates must have a Bachelor'...Work at office
$171.6k - $258.1k
...to productionize large-scale ML solutions. Provide technical... ...improve workflows for training, evaluation, model optimization,... ...andmultimodal generative AI models. Design, implement, and maintain production... .... Mentor and guide junior engineers and interns in best practices...Relocation$139.5k - $258.1k
A leading technology company is seeking a Machine Learning Engineer to design and implement innovative features related to data processing in Seattle. The role involves building robust data pipelines, conducting failure analyses, and optimizing machine learning models....- General Motors is seeking a Machine Learning Engineer to design and implement innovative ML solutions that align with their autonomous driving objectives. This role involves working with large-scale datasets and collaborating across cross-functional teams to deploy and...Work at officeRemote work
$25 - $45 per hour
...leads into customers through AI/ML marketing optimization and... .... Intern - Machine Learning Engineering About The Role We are looking... ...You’ll Do Assist in training, evaluating, and deploying ML models... ...marketing and sales systems Help automate and maintain data and ML...Hourly payInternshipImmediate startRemote work- Preference Model in Seattle is seeking a new graduate Machine Learning Engineer to design and build reinforcement learning environments. This role combines research and engineering, requiring up-to-date knowledge and innovative coding skills. The ideal candidate will have...
- ..., and reasoning workflows. Evaluate and challenge model selection... ...methodologies, prompt engineering strategies, and fine‑tuning... ...Learning, including 3-4+ years designing and deploying enterprise NLP... ...Experience implementing AI‑driven automation and workflow orchestration...
$184.5k
...for everyone, everywhere. We design cutting-edge tech to make travel... ...open world. Join us.Senior ML/Gen AI EngineerIntroduction to... ...we need technically passionate engineers with an entrepreneurial approach... ...ML modelsExperience with automated testing across different layers...Local areaFlexible hours$139.5k - $258.1k
...Learning and AI As a Machine Learning Engineer in the Machine Intelligence Neural Design (MIND) team, you will have an opportunity to be part of an ML innovation organization within Apple... ...models. Able to define metrics, evaluate ML models, and perform error analysis...Temporary workRelocation- The Consulting Solutions in Seattle is seeking a Machine Learning Engineer to own the lifecycle of ML model development for payment systems. You will design, build, and operate ML-powered systems to enhance fraud detection and protect users. The ideal candidate has over...
- Apple Inc. in Seattle is seeking a Software Engineer to build and ship features for its generative AI evaluation platform. In this hands-on role, you will collaborate closely with research engineers and integrate ML research into reliable services. Strong Python skills...
$99.6k - $234.6k
Principal AI Agent / ML Software Engineer (OCI) Job... ...deeply hands‑on in design, code, reviews, operations... ...memory, retrieval, evaluation, guardrails, and cloud... ...strategy, deployment automation, incident analysis,... ...experiments, golden tasks, adversarial testing, regression...Temporary workFlexible hours$201.3k - $302.2k
Staff Machine Learning Platform Engineer, AI Evaluation Seattle, Washington, United... ...to lead the architectural design and development of the high... ...upholding the code quality, automation, and testing rigor required... ...integrate seamlessly with existing ML infrastructure and developer...Relocation$148.5k - $313.7k
100 Salesforce, Inc. is seeking a Software Engineer for ML Infrastructure to design and operate core systems that power AI at Slack. Candidates should have significant experience in software engineering, particularly in infrastructure and distributed systems, as well as...- Snapchat seeks a Software Engineer, ML Infrastructure to design and optimize infrastructure for machine learning workloads. This role involves building scalable ML model training and serving systems, enhancing feature generation pipelines, and collaborating with machine...Work at office
- Menlo Ventures is seeking a Principal ML Engineer in Seattle, WA, to design and implement AI solutions for counter UAS devices. The ideal candidate will have extensive experience in software engineering, particularly in deploying AI models at the edge. Responsibilities...
- Workday, Inc. seeks a Senior Machine Learning Engineer to design and build core ML systems for AI agents in Seattle. In this role, you'll work within a senior, cross-functional team to create production-grade AI solutions that integrate deeply into Workday's platform....Remote job
$134.8k - $245.8k
Apple Inc. in Seattle, Washington, is seeking a Software Engineer in Test to define test strategies for AI/ML-powered services, develop automated testing frameworks, and drive quality engineering best practices. Candidates should have a degree in a relevant field and at...
Do you want to receive more vacancies?
Subscribe and receive similar vacancies to ML Engineer - Automated Evaluation and Adversarial Design. Be the first to apply!
- graduate machine learning engineer Seattle, WA
- machine learning engineer Seattle, WA
- data scientist machine learning engineer Seattle, WA
- senior ml engineer Seattle, WA
- computer vision machine learning engineer Seattle, WA
- ai ml engineer Seattle, WA
- machine learning software engineer Seattle, WA
- machine learning ai engineer Seattle, WA
- machine learning scientist Seattle, WA
- machine learning remote Seattle, WA
