ML Engineer - Automated Evaluation and Adversarial Design

$139.5k - $258.1k

Apple Oakbrook

ML Engineer - Automated Evaluation and Adversarial Design Culver City, California, United States Software and Services The Productivity and Machine Learning Evaluation team ensures the quality of AI-powered features across a suite of productivity and creative applications; including Creator Studio, used by hundreds of millions of people. This team serves as the primary evaluation function, providing critical quality signals that directly influence model development decisions and product launches. This role focuses on building and scaling automated evaluation systems and designing adversarial and stress-testing methodologies across multiple AI features. The work requires a deep understanding of how AI systems fail and how to measure quality rigorously. As features evolve from single-turn interactions into multi-turn, agentic experiences, the evaluation challenge shifts from assessing individual outputs to stress-testing entire conversation flows and agent decision chains. This is an opportunity to shape the evaluation infrastructure that determines whether AI features meet the bar for hundreds of millions of users. Description Day-to-day work involves designing, building, and maintaining automated evaluation systems that assess AI feature quality at scale, including multi-turn conversation evaluation and end-to-end agent workflow testing. This includes creating adversarial test suites that probe model weaknesses and running stress tests to ensure features perform under demanding conditions, with particular focus on failure modes that only emerge across extended interactions, such as: context degradation, goal drift, and compounding errors. Typical deliverables include: evaluation frameworks and rubrics, quality assessment reports, adversarial test case libraries, multi-turn stress-test pipelines, and recommendations on model readiness. Responsibilities Define and own the automated evaluation approach for AI features, translating qualitative notions of quality into measurable, reproducible assessments across both single-turn and multi-turn agentic experiences Build adversarial test suites that target known and emerging model failure modes, including edge cases relevant to productivity application workflows including conversation-level failures such as context loss, instruction forgetting, and cascading errors across multi-step tasks Develop and execute stress test protocols that validate minimum performance thresholds under atypical input conditions including extended conversation lengths, adversarial mid-conversation topic shifts, and complex tool-use sequences Ensure alignment between automated and human evaluation methods on an ongoing basis, identifying and resolving systematic disagreements Collaborate with engineering partners to integrate evaluation into development and release workflows Scale adversarial test case generation and stress test execution, leveraging automation where appropriate, including programmatic generation of multi-turn conversation scenarios and agent interaction traces Influence model and feature quality decisions by communicating evaluation findings and readiness assessments to cross-functional partners Minimum Qualifications Bachelor’s degree in Computer Science, Machine Learning, Statistics, or a related field 4+ years of experience building or significantly extending ML evaluation systems, including designing evaluation benchmarks or quality assessment frameworks including evaluation of sequential or multi-step AI outputs Experience independently defining evaluation architecture and methodology for AI or ML systems with the ability to design evaluation approaches where the unit of analysis is a conversation or session rather than a single output Experience designing adversarial or red‑teaming test methodologies for ML models or AI‑powered features including adversarial scenarios that target failures across multi‑turn interactions Experience with Python and ML frameworks (PyTorch, TensorFlow, or equivalent) in production or near‑production settings Track record of owning technical direction for evaluation efforts across multiple features or product areas Preferred Qualifications Experience evaluating user-facing AI features in consumer applications, with an understanding of how technical metrics connect to user‑perceived quality Familiarity with productivity software or creative tools, with the ability to assess output quality from a user workflow perspective Experience ensuring alignment between automated and human evaluation methods, including inter‑annotator agreement analysis and bias detection Track record of designing evaluation systems that scale across multiple features or product areas without requiring bespoke solutions for each Experience evaluating different types of AI systems, including API-based and custom‑trained models Demonstrated ability to communicate evaluation findings and readiness assessments to cross‑functional partners Experience leveraging automation to scale evaluation data generation and analysis Experience building evaluation pipelines for conversational AI, dialogue systems, or agentic workflows, including turn‑level and session‑level automated scoring Familiarity with agent orchestration frameworks (LangChain, LangGraph, CrewAI, AutoGen) and observability tooling (LangSmith, Braintrust, Arize), with an understanding of how to instrument and evaluate multi‑step agent runs Experience designing adversarial tests for tool‑use reliability, function‑calling accuracy, or agent planning quality At Apple, base pay is one part of our total compensation package and is determined within a range. This provides the opportunity to progress as you grow and develop within a role. The base pay range for this role is between $139,500 and $258,100, and your base pay will depend on your skills, qualifications, experience, and location. Apple employees also have the opportunity to become an Apple shareholder through participation in Apple’s discretionary employee stock programs. Apple employees are eligible for discretionary restricted stock unit awards, and can purchase Apple stock at a discount if voluntarily participating in Apple’s Employee Stock Purchase Plan. You’ll also receive benefits including: Comprehensive medical and dental coverage, retirement benefits, a range of discounted products and free services, and for formal education related to advancing your career at Apple, reimbursement for certain educational expenses — including tuition. Additionally, this role might be eligible for discretionary bonuses or commission payments as well as relocation. Learn more about Apple Benefits. Note: Apple benefit, compensation and employee stock programs are subject to eligibility requirements and other terms of the applicable plan or program. Apple is an equal opportunity employer that is committed to inclusion and diversity. We seek to promote equal opportunity for all applicants without regard to race, color, religion, sex, sexual orientation, gender identity, national origin, disability, Veteran status, or other legally protected characteristics. Learn more about your EEO rights as an applicant . #J-18808-Ljbffr Apple

Apply

Vacancy posted 1 day ago

Similar jobs that could be interesting for youBased on the ML Engineer - Automated Evaluation and Adversarial Design in Culver City, CA vacancy

ML Evaluation Engineer: Adversarial Testing & Multi-Turn QA
$139.5k - $258.1k
Apple Inc. is seeking an ML Engineer in Culver City, California, to work on automated evaluation systems for AI features. The role involves defining evaluation methods, building adversarial test suites, and collaborating with engineering teams to ensure quality across...
Suggested
Apple Inc.
Culver City, CA
1 day ago
AI/ML Engineer - Architectural Drawing Understanding (US)
AI/ML Engineer - Architectural Drawing Understanding (US) Responsibilities... ...format. The role emphasizes designing and training computer vision... ...training. Benchmark, evaluate, and continuously improve model... ...vision models into design automation and CAD/BIM workflows. Qualifications...
Suggested
Genia
Los Angeles, CA
19 hours ago
AI/ML Engineer - Architectural Modeling (Intern) - US
...drawing data. Train and evaluate deep learning models (e... ...the guidance of senior engineers. Support the data... ...Python and at least one ML/CVframework (e.g., PyTorch... ...product, Structural CoPilot, automates the generation of structural engineering design drawings for the...
Suggested
Full time
Internship
Genia
Los Angeles, CA
3 days ago
Staff GenAI/ML Engineer (Emerging Tech & AI Automation) Project Hire
$171.6k - $230.1k
...Staff GenAI/ML Engineer (Emerging Tech & AI Automation) At Disney, we’re storytellers. We make the impossible, possible... ...support long‑term innovation. Lead design and rapid prototyping of GenAI‑... ...‑impact business opportunities. Evaluate and integrate LLMs and modern GenAI...
Suggested
Permanent employment
Full time
1008 Disney Worldwide Services, Inc.
Los Angeles, CA
19 hours ago
ML Engineer - Evaluation Analysis, Metric and Data Strategy
$139.5k - $258.1k
ML Engineer - Evaluation Analysis, Metric and Data Strategy Culver City, California, United States Software and Services The Productivity and... ...signals and real‑world user behavior. The work involves designing feature-level quality metrics, collaborating with partner...
Suggested
Relocation
Apple Inc.
Culver City, CA
1 day ago
Senior ML Engineer: Vision Systems for Smart Manufacturing
A leading manufacturing technology firm in Los Angeles is seeking a Senior Machine Learning Engineer to design and build advanced software systems for automating precision manufacturing. The engineer will work on cutting-edge deep learning models, contribute to the Machine...
Hadrian Automation
Los Angeles, CA
1 day ago
Sr. Machine Learning Ops Engineer (Director)
$175k - $225k
...today! POSITION PURPOSE The Senior ML Ops Engineer leads the design and maintenance of scalable, secure... ...of AI value realization by automating and scaling ML models and GenAI applications... ...engineering best practices and LLM evaluation frameworks to ensure output quality...
16 hours
Local area
Medium
Los Angeles, CA
1 day ago
Python Test Automation Engineer
...hardware, firmware, and software development. You will design cutting-edge robotic automation systems, build robust test frameworks, and drive... ...Requirements Bachelor's degree in Computer Science, Computer Engineering, Electrical Engineering, Mechanical Engineering, or a...
EPAM
Los Angeles, CA
19 hours ago
Senior AI/ML Engineer / Los Angeles / Fully Onsite / Python / LLMs / Vector DB
...a full‑time, fully onsite, hands‑on AI/ML Engineer contract role. You’ll use state‑of‑the‑art... ..., Vector Databases, and Azure OpenAI—to design, build and deploy production‑grade... ...preferred) Observability, monitoring, and evaluation frameworks Retrieval‑Augmented Generation...
Full time
Contract work
Motion Recruitment Partners LLC
Los Angeles, CA
19 hours ago
Senior ML Engineer - Personalization & Recommendations (Remote)
$160k - $195k
Meredith Corporation is seeking a Senior Software Engineer for ML to enhance user personalization through advanced recommendation algorithms. You will own the design and implementation of a core personalization engine, collaborating with product and data teams to deliver...
Remote job
Meredith Corporation
Los Angeles, CA
19 hours ago
Machine Learning Engineer
...About Machina Labs Engineering moves at software... ...directly from digital design. By integrating advanced... ...forming, robotics, and automated production inside a... ...architecting scalable ML pipelines. If you’re passionate... ...Design, build, train, evaluate, and deploy machine...
Flexible hours
Machina Labs
Los Angeles, CA
11 days ago
Audio ML Engineer (Research)
$134.25k - $196.9k
...leader About The Role The Audio ML Engineer (Research) develops learning... ...and drive adaptive behavior—designed from the start for embedded... ...in cloud pipelines (batch evaluation, fleet learning, offline... ...assistants, data analysis copilots, automated report generation) to...
Full time
Temporary work
Immediate start
Remote work
Flexible hours
HARMAN
Los Angeles, CA
1 day ago
Senior HVAC Control Systems Engineer-Data Center Automation
Ramboll Group A/S is seeking a Control Systems Engineer based in Los Angeles, CA. You will lead automation design teams and mentor engineers while enhancing HVAC control systems. The ideal candidate has 5+ years of experience in data center engineering and control system...
Ramboll Group A/S
Los Angeles, CA
1 day ago
AI/ML Engineer - Higher Ed
$101.9k - $163k
...fabric of how we work every day. To learn more, please see The AI/ML Engineer - Higher Education builds AI capabilities for Cengage's higher... ..., learning outcomes, and instructor productivity. You will design, build, and ship production AI features integrated directly...
Live in
Local area
Worldwide
Cengage Group
Los Angeles, CA
2 days ago
Machine Learning Engineer
$160k - $250k
...advanced software systems to automate Design for Manufacturing (DFM)... ...augment or automate complex engineering judgment. As a Senior Machine... ...training, inference, labeling, and evaluation Judiciously combine open-... ...throughout the entire ML Lifecycle Proficiency in Python...
Permanent employment
For contractors
Local area
Immediate start
Relocation
Flexible hours
Hadrian Automation
Los Angeles, CA
3 days ago
Senior ML Engineer
$160k - $180k
...seeking a Senior Machine Learning Engineer to join our growing team... ..., and software engineering to design systems that can reason, adapt... ...applications Continuously evaluate and improve model performance,... ...Strong proficiency in Python and ML frameworks like PyTorch,...
Local area
Fox
Los Angeles, CA
19 hours ago
ML Engineer
...industry together! Machine Learning Engineer, Applied AI As a MLE you'll... ...and work across the full applied ML stack - deploying models, building the evaluation systems that tell us whether they... ...patterns and privacy-by-design data handling Open-source contributions...
Work at office
Remote work
Work from home
Worldwide
Home office
Flexible hours
CreatorIQ
Los Angeles, CA
4 days ago
Principal ML Engineer
$300k - $375k
...and delivery of offline/online ML systems, feature pipelines,... ...loops, and monitoring. Lead the design, build, and evolution of... ...frameworks, including offline evaluation, A/B testing, KPI design, and... ...architecture. Partner closely with Data Engineering, BI, Product, Engineering,...
Full time
Flexible hours
Prodege
El Segundo, CA
19 hours ago
Machine Learning (ML) Ops Engineer - IS Clinical Research - Full Time 8 Hour Days (Exempt) (Non[...]
$145.6k - $240.24k
Machine Learning (ML) Ops Engineer - IS Clinical Research - Full Time 8 Hour Days (Exempt) (Non... ...of machine learning models, including design, build, and maintenance of machine learning... ...will ensure seamless integration, automation, and scaling of AI solutions within the...
Full time
Work experience placement
Local area
University of Southern California
Los Angeles, CA
1 day ago
ML Engineer
$132k - $165k
Machine Learning Engineer, Applied AI As a MLE you’ll join our Product Innovations... ...and work across the full applied ML stack—deploying models, building the evaluation systems that tell us whether they... ...patterns and privacy‑by‑design data handling. Open‑source contributions...
Work at office
Work from home
Home office
CreatorIQ
Los Angeles, CA
1 day ago
Data Center Controls Network Engineer
$257k - $327k
...Data Center Controls Network Engineer Datacenter Design - San Francisco OpenAI is building the infrastructure... ...into practical OT network designs, evaluates vendor solutions, and drives... ...Key Responsibilities Define controls, automation, and OT network requirements for AI data...
For contractors
Work at office
Remote work
OpenAI
Los Angeles, CA
19 hours ago
Senior ML Engineer - Personalization & Loyalty Platform
A next-generation loyalty platform is seeking a skilled Machine Learning Engineer in Los Angeles, CA. You'll design and implement machine learning models to enhance our platform and drive data-driven decisions. The role requires 5+ years of relevant experience, strong...
Hang
Los Angeles, CA
1 day ago
Data Engineering Manager
$171.6k - $230.1k
Data Engineering Manager - Enterprise Technology, Data At... ...Enterprise Technology. We design and develop enterprise data, analytics, and automation solutions used by... ...reporting & analytics, and AI/ML applications. Lead... ...and continuous drive to evaluate and adopt emerging data...
Work experience placement
Worldwide
1008 Disney Worldwide Services, Inc.
Los Angeles, CA
10 hours ago
Principal Machine Learning Engineer (Personalization, Matchmaking, & Player Experience AI, Publ[...]
$251.7k - $351.9k
Principal Machine Learning Engineer (Personalization,... ...expertise across data processing, automation, machine learning ("ML"), artificial intelligence ("AI"), and experimental design to inform decisions and develop... .... Lead post-launch evaluations of algorithmic impact on player...
Temporary work
Local area
Flexible hours
Riot Games
Los Angeles, CA
4 days ago
Senior Data Engineer
...Investment Operations Automation Analyst Tamar Securities is seeking an Investment Operations Automation Analyst to design, build, and maintain automated workflows supporting trading and investment operations as the firm scales. This role sits at the intersection of trading...
Tamar Securities LLC
Los Angeles, CA
1 day ago
Senior Full Stack Engineer (Python, Serverless, AI Fluency)
$140k - $175k
...Senior Full Stack Engineer (Python, Serverless, AI Fluency) Los... ...Angeles Vynyl's technologists, designers and product strategists are... ...fluency with modern AI/ML development tools (e.g., GitHub... ...Experience with CI/CD pipelines, automated testing, and Infrastructure-...
Full time
Shift work
VYNYL
Los Angeles, CA
3 days ago
Data Engineer
...Manager, Data Engineering United States Brainlabs is the media... ...to 5 years of experience in designing, building, and managing scalable... ...for LLM applications and AI/ML model training, is a strong plus... ...ML, or AutoML) for building, evaluating, or serving models is a...
Full time
Work experience placement
Work at office
Brainlabs
Los Angeles, CA
19 hours ago
Analytics Engineer
...Job Description: This Analytics Engineer role operates at the... ...Architecture & DataMart Development Design and maintain analytic-ready datasets... ...Python for data processing, automation, and analytics workflows... ...and validation Exposure to AI/ML or LLM-based use cases, including...
Work at office
Remote work
Elevateprimesolutions
Los Angeles, CA
1 day ago
Product-Driven ML Research Engineer
$295k
...leading AI research company is seeking a Research Engineer / Scientist in San Francisco, CA to enhance... ...collaborating with other teams, and building robust evaluations for improvements. Ideal candidates should possess strong ML engineering skills and thrive in complex...
Relocation package
OpenAI
Los Angeles, CA
19 hours ago
Production ML Engineer: NLP, Embeddings & LLMs at Scale
CreatorIQ in Los Angeles is seeking a Machine Learning Engineer to join our Product Innovations team. This role involves deploying and monitoring ML systems at scale, working with data science on evaluation workflows, and improving MLOps foundations. The ideal candidate...
CreatorIQ
Los Angeles, CA
1 day ago

Do you want to receive more vacancies?

Subscribe and receive similar vacancies to ML Engineer - Automated Evaluation and Adversarial Design. Be the first to apply!