ML Engineer - Automated Evaluation and Adversarial Design

$139.5k - $258.1k

Apple Inc.

ML Engineer - Automated Evaluation and Adversarial Design San Diego, California, United States Software and Services The Productivity and Machine Learning Evaluation team ensures the quality of AI-powered features across a suite of productivity and creative applications; including Creator Studio, used by hundreds of millions of people. This team serves as the primary evaluation function, providing critical quality signals that directly influence model development decisions and product launches. This role focuses on building and scaling automated evaluation systems and designing adversarial and stress‑testing methodologies across multiple AI features. The work requires a deep understanding of how AI systems fail and how to measure quality rigorously. As features evolve from single‑turn interactions into multi‑turn, agentic experiences, the evaluation challenge shifts from assessing individual outputs to stress‑testing entire conversation flows and agent decision chains. This is an opportunity to shape the evaluation infrastructure that determines whether AI features meet the bar for hundreds of millions of users. Description Day‑to‑day work involves designing, building, and maintaining automated evaluation systems that assess AI feature quality at scale, including multi‑turn conversation evaluation and end‑to‑end agent workflow testing. This includes creating adversarial test suites that probe model weaknesses and running stress tests to ensure features perform under demanding conditions, with particular focus on failure modes that only emerge across extended interactions, such as: context degradation, goal drift, and compounding errors. Typical deliverables include: evaluation frameworks and rubrics, quality assessment reports, adversarial test case libraries, multi‑turn stress‑test pipelines, and recommendations on model readiness. Responsibilities Define and own the automated evaluation approach for AI features, translating qualitative notions of quality into measurable, reproducible assessments across both single‑turn and multi‑turn agentic experiences Build adversarial test suites that target known and emerging model failure modes, including edge cases relevant to productivity application workflows including conversation‑level failures such as context loss, instruction forgetting, and cascading errors across multi‑step tasks Develop and execute stress test protocols that validate minimum performance thresholds under atypical input conditions including extended conversation lengths, adversarial mid‑conversation topic shifts, and complex tool‑use sequences Ensure alignment between automated and human evaluation methods on an ongoing basis, identifying and resolving systematic disagreements Collaborate with engineering partners to integrate evaluation into development and release workflows Scale adversarial test case generation and stress test execution, leveraging automation where appropriate, including programmatic generation of multi‑turn conversation scenarios and agent interaction traces Influence model and feature quality decisions by communicating evaluation findings and readiness assessments to cross‑functional partners Minimum Qualifications Bachelor’s degree in Computer Science, Machine Learning, Statistics, or a related field 4+ years of experience building or significantly extending ML evaluation systems, including designing evaluation benchmarks or quality assessment frameworks including evaluation of sequential or multi‑step AI outputs Experience independently defining evaluation architecture and methodology for AI or ML systems with the ability to design evaluation approaches where the unit of analysis is a conversation or session rather than a single output Experience designing adversarial or red‑teaming test methodologies for ML models or AI‑powered features including adversarial scenarios that target failures across multi‑turn interactions Experience with Python and ML frameworks (PyTorch, TensorFlow, or equivalent) in production or near‑production settings Track record of owning technical direction for evaluation efforts across multiple features or product areas Preferred Qualifications Experience evaluating user‑facing AI features in consumer applications, with an understanding of how technical metrics connect to user‑perceived quality Familiarity with productivity software or creative tools, with the ability to assess output quality from a user workflow perspective Experience ensuring alignment between automated and human evaluation methods, including inter‑annotator agreement analysis and bias detection Track record of designing evaluation systems that scale across multiple features or product areas without requiring bespoke solutions for each Experience evaluating different types of AI systems, including API‑based and custom‑trained models Demonstrated ability to communicate evaluation findings and readiness assessments to cross‑functional partners Experience leveraging automation to scale evaluation data generation and analysis Experience building evaluation pipelines for conversational AI, dialogue systems, or agentic workflows, including turn‑level and session‑level automated scoring Familiarity with agent orchestration frameworks (LangChain, LangGraph, CrewAI, AutoGen) and observability tooling (LangSmith, Braintrust, Arize), with an understanding of how to instrument and evaluate multi‑step agent runs Experience designing adversarial tests for tool‑use reliability, function‑calling accuracy, or agent planning quality At Apple, base pay is one part of our total compensation package and is determined within a range. This provides the opportunity to progress as you grow and develop within a role. The base pay range for this role is between $139,500 and $258,100, and your base pay will depend on your skills, qualifications, experience, and location. Apple employees also have the opportunity to become an Apple shareholder through participation in Apple’s discretionary employee stock programs. Apple employees are eligible for discretionary restricted stock unit awards, and can purchase Apple stock at a discount if voluntarily participating in Apple’s Employee Stock Purchase Plan. You’ll also receive benefits including comprehensive medical and dental coverage, retirement benefits, a range of discounted products and free services, and for formal education related to advancing your career at Apple, reimbursement for certain educational expenses—including tuition. Additionally, this role might be eligible for discretionary bonuses or commission payments as well as relocation. Learn more about Apple Benefits. Note: Apple benefit, compensation and employee stock programs are subject to eligibility requirements and other terms of the applicable plan or program. Apple is an equal opportunity employer that is committed to inclusion and diversity. We seek to promote equal opportunity for all applicants without regard to race, color, religion, sex, sexual orientation, gender identity, national origin, disability, Veteran status, or other legally protected characteristics. Learn more about your EEO rights as an applicant. Apple accepts applications to this posting on an ongoing basis. #J-18808-Ljbffr Apple Inc.

Apply

Vacancy posted 4 days ago

Similar jobs that could be interesting for youBased on the ML Engineer - Automated Evaluation and Adversarial Design in San Diego, CA vacancy

ML Engineer - Automated Evaluation and Adversarial Design
...Productivity and Machine Learning Evaluation team ensures the quality of AI-... ...focuses on building and scaling automated evaluation systems and designing adversarial and stress-testing methodologies... ...building or significantly extending ML evaluation systems, including designing...
Suggested
Shift work
Apple
San Diego, CA
1 day ago
ML Engineer - Evaluation Analysis, Metric and Data Strategy
...Productivity and Machine Learning Evaluation team ensures the quality of... ...behavior. The work involves designing feature-level quality metrics... ...metrics for AI-powered or ML-driven features in consumer-facing... ...Experience partnering with engineering or data teams to define data...
Suggested
Apple
San Diego, CA
1 day ago
Cellular Data & Automation Engineer
$120.3k - $210.1k
...elements into a single, integrated design. Join us, and you’ll help us... ...? As a Cellular Data & Automation Engineer, you will be at the center... ...Collaborate closely with AI/ML engineers to integrate new AI... ...and desire to learn and evaluate new technologies. At Apple,...
Suggested
Relocation
Apple Inc.
San Diego, CA
2 days ago
Senior Machine Learning Engineer
$110k - $180k
...Senior Machine Learning Engineer The Marlin Alliance, Inc. is seeking... ...Machine Learning Engineer to design, develop, and implement... ...engineering best practices. The Senior ML Engineer will collaborate with... ...for seamless execution of automated business processes and the generation...
Suggested
Contract work
The Marlin Alliance
San Diego, CA
1 day ago
Video ML Engineer: On‑Device AI for Next‑Gen Video
A leading tech company in San Diego is seeking a passionate machine learning engineer to design and develop cutting-edge algorithms for video processing. This role involves working with deep learning architectures and collaborating with top researchers in the field. Ideal...
Suggested
Apple Inc.
San Diego, CA
3 days ago
Senior ML Engineer - Naval AI & Data Pipelines (TS)
$110k - $180k
A defense technology company is seeking a Senior Machine Learning Engineer to design and implement advanced models for naval applications. Candidates should have robust programming skills, particularly in Python, and familiarity with cloud platforms and containerization...
The Marlin Alliance, Inc.
San Diego, CA
4 days ago
AI/ML Software Engineer Intern — RFIC & LLM Automation
...semiconductor company is seeking a Software Engineer intern for Summer 2026. The role involves designing and implementing AI/ML solutions, with a focus on integrating... ...include full stack development, RFIC design automation, and backend management. Ideal candidates should...
Hourly pay
Summer work
Internship
Murata Manufacturing Co., Ltd.
San Diego, CA
5 days ago
Senior ML Engineer (Hardware-Software Co-Design)
Qualcomm is seeking a Machine Learning Engineer to push the boundaries of technology by creating and implementing efficient machine learning techniques. You will work on optimizing software for AI models and collaborate with teams to enhance various products across mobile...
Nutanix
San Diego, CA
5 days ago
Senior Staff Machine Learning Software Engineer
$202k - $215k
Senior Staff Machine Learning Software Engineer Location: San Diego, CA Job Type: Full-... ...facing tests Building inference‑adjacent evaluation machinery: calibration checks,... ...changes Algorithm research and novel model design live on a separate track. You will collaborate...
Full time
Contract work
Local area
Foresite Labs (Stealth Co)
San Diego, CA
5 days ago
Sr Machine Learning Engineer
$165k - $195k
...Senior Machine Learning Engineer The Marlin Alliance,... ...will be expected to design, develop, and implement... ...practices. The Senior ML Engineer will collaborate... ...seamless execution of automated business processes and... ...feature engineering, model evaluation, and validation....
Contract work
Work at office
The Marlin Alliance
San Diego, CA
8 hours ago
ML Engineer - Creator Studio
...ML Engineer - Creator Studio Apple is where individual imaginations gather together, committing... .... You will work with human interface designers, quality assurance teams, and cross-... ...~ Proven track record in training, evaluating, and deploying multimodal large language...
Apple
San Diego, CA
5 days ago
Modem Machine Learning Engineer
$128k - $192k
...Inc. Job Area: Engineering Group, Engineering... ...deliver production-ready ML solutions. You will place... ...systems. ~ Design, develop, and train robust... ...~ Build and integrate automated, end-to-end ML pipelines... ..., model training, evaluation, and deployment. ~...
Work experience placement
Work from home
Qualcomm
San Diego, CA
3 days ago
Staff Data Engineer - Science
$154.4k - $231.6k
...suffering it causes. As a Staff Data Engineer, this extremely seasoned... ...will have a lead role in the design, development, and testing of... ...in methods, techniques, and evaluation criteria for obtaining... ...based workflows (Github/Gitlab), automated testing, and deployment strategies...
Full time
For contractors
Local area
Night shift
Weekend work
Exact Sciences
San Diego, CA
2 days ago
Remote ML Engineer - Flexible Contract Roles
...talent network: Join our Machine Learning Engineer Expert Network to connect with leading AI... ...our network contribute to:Training and evaluating AI models in Machine Learning EngineeringCreating... ...learning model development, Python & ML frameworks (PyTorch / TensorFlow), model...
Contract work
Remote work
Flexible hours
Mercor Inc
La Mesa, CA
4 days ago
Associate Machine Learning Engineer
$70k - $100k
...Associate ML Engineer/ Agentic AI Are you interested in harnessing... ...with generative AI, automation, and decision-driven systems... ...operations Support agent behavior design, including planning logic,... ..., prompt engineering, model evaluation, and iterative performance tuning...
Internship
Flexible hours
XIFIN
San Diego, CA
2 days ago
Senior Embedded ML Engineer - Cloud Edge & Data Center
$111.3k - $166.9k
Qualcomm Technologies, Inc. is seeking a Senior Software Engineer in San Diego, CA, to design and develop software for Cloud Edge and Data Center applications. This role involves close collaboration with cross-functional teams and requires a strong background in embedded...
Stryker Corporation
San Diego, CA
2 days ago
Staff Machine Learning Engineer - AI/ML Compiler
$160.5k - $240.7k
..., optimize, and deploy ML models on Qualcomm devices... ..., or Qualcomm AI Engine Direct SDK (QAIRT) — and... ...What You'll Do Design, develop, and maintain... ...diagnostics Model Catalog, Automation & Collaboration... ...compilation pipelines and CI/CD evaluation harnesses to scale...
Work experience placement
Work from home
Qualcomm
San Diego, CA
8 hours ago
MLOps Engineer - Remote (AWS Certified Machine Learning)
...Position : MLOps Engineer – Remote (AWS Certified Machine Learning... ..., you'll be responsible for designing, building, and maintaining... ...role requires expertise in automating ML workflows, enhancing model reproducibility... ..., model training, evaluation, and deployment. ~...
Contract work
Remote work
Day shift
MILLENNIUMSOFT
San Diego, CA
3 days ago
Sr Engineer, Machine Learning Engineering (Heterogenous SW, Adreno GPU)
$140.8k - $211.2k
..., Inc. Job Area: Engineering Group, Engineering Group... .... You will architect, design, develop, test, and deploy... ...in AI and general ML techniques Proven hands-on experience evaluating and optimizing Generative... ...practices (code review, CI/CD, automation, etc.) Strong Linux...
Work experience placement
Work from home
Qualcomm
San Diego, CA
2 days ago
Senior Staff Machine Learning Engineer, (ML Underwriting)
$260k - $310k
...Staff Machine Learning Engineer and become a pivotal part of our innovative ML team. Our team is dedicated... ..., and risk leaders to design, implement, and scale... ...to model training, evaluation, and production deployment... ...infrastructure, monitoring, and automated retraining. You provide...
Work at office
Remote work
Flexible hours
Affirm
San Diego, CA
4 days ago
MLOps Engineer - Remote (AWS Certified Machine Learning)
...Join to apply for the MLOps Engineer - Remote (AWS Certified Machine... ...you\'ll be responsible for designing, building, and maintaining... ...role requires expertise in automating ML workflows, enhancing model reproducibility... ..., model training, evaluation, and deployment. Excellent...
Contract work
Remote work
Day shift
MILLENNIUMSOFT
San Diego, CA
4 days ago
Remote Data Scientist
...non-technical stakeholders.Collaborate closely with team members to design and execute end-to-end data projects, from ideation to delivery.Continuously improve analytical methodologies and automation processes to enhance data workflows.Communicate effectively in both written...
Remote work
Micro1
Bonita, CA
4 days ago
Data Scientist
...learning and statistical models • Design scalable data pipelines using... ...PyTorch • Mentor junior engineers and lead code reviews, best... ...big data, streaming AI/ML training and prediction pipelines... ...LSTM, hybrid models • Model evaluation, cross validation, hyper...
Omni Inclusive
San Diego, CA
2 days ago
Data Scientist - Finance & Accounting Automation
$46.3 - $69.46 per hour
...accounting operations through intelligent automation. This role combines deep technical... ...adoption across the organization. You will design and deploy AI-enabled solutions, including... ...operations. Design, build, and deploy AI/ML business solutions (including LLMs and AI...
Full time
Work experience placement
Work from home
Qualcomm
San Diego, CA
2 days ago
Data Scientist - Growth
...conduct deep analyses, build customer segments, measure attribution, design experiments, and answer ambiguous, high-impact marketing... ...with our marketing and growth teams to shape marketing strategy, evaluate channel performance, and ensure we are reaching and converting...
Full time
Casual work
H1b
Worldwide
Relocation package
Flexible hours
Art of Problem Solving
San Diego, CA
1 day ago
Remote ML Engineer for Contract Projects
...talent networkJoin our Machine Learning Engineer Expert Network to connect with leading AI... ...basis.About Mercor projectsTraining and evaluating AI models in Machine Learning EngineeringCreating... ...learning model development, Python & ML frameworks (PyTorch / TensorFlow), model...
Contract work
Remote work
Mercor Inc
National City, CA
1 day ago
Staff Machine Learning Engineer - Model Optimization & Quantization
$158.4k - $237.6k
..., Inc. Job Area: Engineering Group, Engineering Group... ...optimizing and deploying ML models - especially for edge... ...You'll Do Design, develop, and maintain... ...Develop and maintain automated quantization pipelines and evaluation harnesses to scale model...
Work experience placement
Immediate start
Work from home
Qualcomm
San Diego, CA
8 hours ago
Principal Machine Learning Engineer
$264k - $330k
...assistance to power real automation and decision‑making.... ...Principal Machine Learning Engineer to help define and lead... ...Intelligent AI Agents: Design and deploy advanced AI... ...for end‑to‑end ML systems: data collection, model training, evaluation, deployment, and inference...
AppFolio, Inc
San Diego, CA
5 days ago
Sr. Staff Engineer, Machine Learning Engineering (Quantization SW)
$178.4k - $267.6k
...Technologies, Inc. Job Area: Engineering Group, Engineering... ...and tools. Architect, design, develop and test model... ...in AI and general ML techniques. Proven hands-on experience evaluating and optimizing Generative... ...practices (code review, CI/CD, automation, etc.) is a plus. EEO...
Work experience placement
Work from home
Stryker Corporation
San Diego, CA
2 days ago
Applied ML Engineer — Sensing & Location Connectivity
...company based in San Diego is seeking a motivated Software Engineer for Sensing & Connectivity. In this role, you will... ...learning to enhance user experiences. Responsibilities include designing and evaluating ML systems and analyzing behavioral patterns from sensor data...
Apple Inc.
San Diego, CA
1 day ago

Do you want to receive more vacancies?

Subscribe and receive similar vacancies to ML Engineer - Automated Evaluation and Adversarial Design. Be the first to apply!