Sign up to access all features of our service.
  • Job search
  • Favorites
  • Create a CV
    New
  • Salaries
  • Subscriptions

ML Engineer - Automated Evaluation and Adversarial Design

$139.5k - $258.1k

Apple Inc.

ML Engineer - Automated Evaluation and Adversarial Design San Diego, California, United States Software and Services The Productivity and Machine Learning Evaluation team ensures the quality of AI-powered features across a suite of productivity and creative applications; including Creator Studio, used by hundreds of millions of people. This team serves as the primary evaluation function, providing critical quality signals that directly influence model development decisions and product launches. This role focuses on building and scaling automated evaluation systems and designing adversarial and stress‑testing methodologies across multiple AI features. The work requires a deep understanding of how AI systems fail and how to measure quality rigorously. As features evolve from single‑turn interactions into multi‑turn, agentic experiences, the evaluation challenge shifts from assessing individual outputs to stress‑testing entire conversation flows and agent decision chains. This is an opportunity to shape the evaluation infrastructure that determines whether AI features meet the bar for hundreds of millions of users. Description Day‑to‑day work involves designing, building, and maintaining automated evaluation systems that assess AI feature quality at scale, including multi‑turn conversation evaluation and end‑to‑end agent workflow testing. This includes creating adversarial test suites that probe model weaknesses and running stress tests to ensure features perform under demanding conditions, with particular focus on failure modes that only emerge across extended interactions, such as: context degradation, goal drift, and compounding errors. Typical deliverables include: evaluation frameworks and rubrics, quality assessment reports, adversarial test case libraries, multi‑turn stress‑test pipelines, and recommendations on model readiness. Responsibilities Define and own the automated evaluation approach for AI features, translating qualitative notions of quality into measurable, reproducible assessments across both single‑turn and multi‑turn agentic experiences Build adversarial test suites that target known and emerging model failure modes, including edge cases relevant to productivity application workflows including conversation‑level failures such as context loss, instruction forgetting, and cascading errors across multi‑step tasks Develop and execute stress test protocols that validate minimum performance thresholds under atypical input conditions including extended conversation lengths, adversarial mid‑conversation topic shifts, and complex tool‑use sequences Ensure alignment between automated and human evaluation methods on an ongoing basis, identifying and resolving systematic disagreements Collaborate with engineering partners to integrate evaluation into development and release workflows Scale adversarial test case generation and stress test execution, leveraging automation where appropriate, including programmatic generation of multi‑turn conversation scenarios and agent interaction traces Influence model and feature quality decisions by communicating evaluation findings and readiness assessments to cross‑functional partners Minimum Qualifications Bachelor’s degree in Computer Science, Machine Learning, Statistics, or a related field 4+ years of experience building or significantly extending ML evaluation systems, including designing evaluation benchmarks or quality assessment frameworks including evaluation of sequential or multi‑step AI outputs Experience independently defining evaluation architecture and methodology for AI or ML systems with the ability to design evaluation approaches where the unit of analysis is a conversation or session rather than a single output Experience designing adversarial or red‑teaming test methodologies for ML models or AI‑powered features including adversarial scenarios that target failures across multi‑turn interactions Experience with Python and ML frameworks (PyTorch, TensorFlow, or equivalent) in production or near‑production settings Track record of owning technical direction for evaluation efforts across multiple features or product areas Preferred Qualifications Experience evaluating user‑facing AI features in consumer applications, with an understanding of how technical metrics connect to user‑perceived quality Familiarity with productivity software or creative tools, with the ability to assess output quality from a user workflow perspective Experience ensuring alignment between automated and human evaluation methods, including inter‑annotator agreement analysis and bias detection Track record of designing evaluation systems that scale across multiple features or product areas without requiring bespoke solutions for each Experience evaluating different types of AI systems, including API‑based and custom‑trained models Demonstrated ability to communicate evaluation findings and readiness assessments to cross‑functional partners Experience leveraging automation to scale evaluation data generation and analysis Experience building evaluation pipelines for conversational AI, dialogue systems, or agentic workflows, including turn‑level and session‑level automated scoring Familiarity with agent orchestration frameworks (LangChain, LangGraph, CrewAI, AutoGen) and observability tooling (LangSmith, Braintrust, Arize), with an understanding of how to instrument and evaluate multi‑step agent runs Experience designing adversarial tests for tool‑use reliability, function‑calling accuracy, or agent planning quality At Apple, base pay is one part of our total compensation package and is determined within a range. This provides the opportunity to progress as you grow and develop within a role. The base pay range for this role is between $139,500 and $258,100, and your base pay will depend on your skills, qualifications, experience, and location. Apple employees also have the opportunity to become an Apple shareholder through participation in Apple’s discretionary employee stock programs. Apple employees are eligible for discretionary restricted stock unit awards, and can purchase Apple stock at a discount if voluntarily participating in Apple’s Employee Stock Purchase Plan. You’ll also receive benefits including comprehensive medical and dental coverage, retirement benefits, a range of discounted products and free services, and for formal education related to advancing your career at Apple, reimbursement for certain educational expenses—including tuition. Additionally, this role might be eligible for discretionary bonuses or commission payments as well as relocation. Learn more about Apple Benefits. Note: Apple benefit, compensation and employee stock programs are subject to eligibility requirements and other terms of the applicable plan or program. Apple is an equal opportunity employer that is committed to inclusion and diversity. We seek to promote equal opportunity for all applicants without regard to race, color, religion, sex, sexual orientation, gender identity, national origin, disability, Veteran status, or other legally protected characteristics. Learn more about your EEO rights as an applicant. Apple accepts applications to this posting on an ongoing basis. #J-18808-Ljbffr Apple Inc.

Vacancy posted 4 days ago
Similar jobs that could be interesting for youBased on the ML Engineer - Automated Evaluation and Adversarial Design in San Diego, CA vacancy
  •  ...Productivity and Machine Learning Evaluation team ensures the quality of AI-...  ...focuses on building and scaling automated evaluation systems and designing adversarial and stress-testing methodologies...  ...building or significantly extending ML evaluation systems, including designing... 
    Suggested
    Shift work

    Apple

    San Diego, CA
    1 day ago
  •  ...Productivity and Machine Learning Evaluation team ensures the quality of...  ...behavior. The work involves designing feature-level quality metrics...  ...metrics for AI-powered or ML-driven features in consumer-facing...  ...Experience partnering with engineering or data teams to define data... 
    Suggested

    Apple

    San Diego, CA
    1 day ago
  • $120.3k - $210.1k

     ...elements into a single, integrated design. Join us, and you’ll help us...  ...? As a Cellular Data & Automation Engineer, you will be at the center...  ...Collaborate closely with AI/ML engineers to integrate new AI...  ...and desire to learn and evaluate new technologies. At Apple,... 
    Suggested
    Relocation

    Apple Inc.

    San Diego, CA
    2 days ago
  • $110k - $180k

     ...Senior Machine Learning Engineer The Marlin Alliance, Inc. is seeking...  ...Machine Learning Engineer to design, develop, and implement...  ...engineering best practices. The Senior ML Engineer will collaborate with...  ...for seamless execution of automated business processes and the generation... 
    Suggested
    Contract work

    The Marlin Alliance

    San Diego, CA
    1 day ago
  • A leading tech company in San Diego is seeking a passionate machine learning engineer to design and develop cutting-edge algorithms for video processing. This role involves working with deep learning architectures and collaborating with top researchers in the field. Ideal... 
    Suggested

    Apple Inc.

    San Diego, CA
    3 days ago
  • $110k - $180k

    A defense technology company is seeking a Senior Machine Learning Engineer to design and implement advanced models for naval applications. Candidates should have robust programming skills, particularly in Python, and familiarity with cloud platforms and containerization... 

    The Marlin Alliance, Inc.

    San Diego, CA
    4 days ago
  •  ...semiconductor company is seeking a Software Engineer intern for Summer 2026. The role involves designing and implementing AI/ML solutions, with a focus on integrating...  ...include full stack development, RFIC design automation, and backend management. Ideal candidates should... 
    Hourly pay
    Summer work
    Internship

    Murata Manufacturing Co., Ltd.

    San Diego, CA
    5 days ago
  • Qualcomm is seeking a Machine Learning Engineer to push the boundaries of technology by creating and implementing efficient machine learning techniques. You will work on optimizing software for AI models and collaborate with teams to enhance various products across mobile... 

    Nutanix

    San Diego, CA
    5 days ago
  • $202k - $215k

    Senior Staff Machine Learning Software Engineer Location: San Diego, CA Job Type: Full-...  ...facing tests Building inference‑adjacent evaluation machinery: calibration checks,...  ...changes Algorithm research and novel model design live on a separate track. You will collaborate... 
    Full time
    Contract work
    Local area

    Foresite Labs (Stealth Co)

    San Diego, CA
    5 days ago
  • $165k - $195k

     ...Senior Machine Learning Engineer The Marlin Alliance,...  ...will be expected to design, develop, and implement...  ...practices. The Senior ML Engineer will collaborate...  ...seamless execution of automated business processes and...  ...feature engineering, model evaluation, and validation.... 
    Contract work
    Work at office

    The Marlin Alliance

    San Diego, CA
    8 hours ago
  •  ...ML Engineer - Creator Studio Apple is where individual imaginations gather together, committing...  .... You will work with human interface designers, quality assurance teams, and cross-...  ...~ Proven track record in training, evaluating, and deploying multimodal large language... 

    Apple

    San Diego, CA
    5 days ago
  • $128k - $192k

     ...Inc. Job Area: Engineering Group, Engineering...  ...deliver production-ready ML solutions. You will place...  ...systems. ~ Design, develop, and train robust...  ...~ Build and integrate automated, end-to-end ML pipelines...  ..., model training, evaluation, and deployment. ~... 
    Work experience placement
    Work from home

    Qualcomm

    San Diego, CA
    3 days ago
  • $154.4k - $231.6k

     ...suffering it causes. As a Staff Data Engineer, this extremely seasoned...  ...will have a lead role in the design, development, and testing of...  ...in methods, techniques, and evaluation criteria for obtaining...  ...based workflows (Github/Gitlab), automated testing, and deployment strategies... 
    Full time
    For contractors
    Local area
    Night shift
    Weekend work

    Exact Sciences

    San Diego, CA
    2 days ago
  •  ...talent network: Join our Machine Learning Engineer Expert Network to connect with leading AI...  ...our network contribute to:Training and evaluating AI models in Machine Learning EngineeringCreating...  ...learning model development, Python & ML frameworks (PyTorch / TensorFlow), model... 
    Contract work
    Remote work
    Flexible hours

    Mercor Inc

    La Mesa, CA
    4 days ago
  • $70k - $100k

     ...Associate ML Engineer/ Agentic AI Are you interested in harnessing...  ...with generative AI, automation, and decision-driven systems...  ...operations Support agent behavior design, including planning logic,...  ..., prompt engineering, model evaluation, and iterative performance tuning... 
    Internship
    Flexible hours

    XIFIN

    San Diego, CA
    2 days ago
  • $111.3k - $166.9k

    Qualcomm Technologies, Inc. is seeking a Senior Software Engineer in San Diego, CA, to design and develop software for Cloud Edge and Data Center applications. This role involves close collaboration with cross-functional teams and requires a strong background in embedded... 

    Stryker Corporation

    San Diego, CA
    2 days ago
  • $160.5k - $240.7k

     ..., optimize, and deploy ML models on Qualcomm devices...  ..., or Qualcomm AI Engine Direct SDK (QAIRT) — and...  ...What You'll Do Design, develop, and maintain...  ...diagnostics Model Catalog, Automation & Collaboration...  ...compilation pipelines and CI/CD evaluation harnesses to scale... 
    Work experience placement
    Work from home

    Qualcomm

    San Diego, CA
    8 hours ago
  •  ...Position : MLOps Engineer – Remote (AWS Certified Machine Learning...  ..., you'll be responsible for designing, building, and maintaining...  ...role requires expertise in automating ML workflows, enhancing model reproducibility...  ..., model training, evaluation, and deployment. ~... 
    Contract work
    Remote work
    Day shift

    MILLENNIUMSOFT

    San Diego, CA
    3 days ago
  • $140.8k - $211.2k

     ..., Inc. Job Area: Engineering Group, Engineering Group...  .... You will architect, design, develop, test, and deploy...  ...in AI and general ML techniques Proven hands-on experience evaluating and optimizing Generative...  ...practices (code review, CI/CD, automation, etc.) Strong Linux... 
    Work experience placement
    Work from home

    Qualcomm

    San Diego, CA
    2 days ago
  • $260k - $310k

     ...Staff Machine Learning Engineer and become a pivotal part of our innovative ML team. Our team is dedicated...  ..., and risk leaders to design, implement, and scale...  ...to model training, evaluation, and production deployment...  ...infrastructure, monitoring, and automated retraining. You provide... 
    Work at office
    Remote work
    Flexible hours

    Affirm

    San Diego, CA
    4 days ago
  •  ...Join to apply for the MLOps Engineer - Remote (AWS Certified Machine...  ...you\'ll be responsible for designing, building, and maintaining...  ...role requires expertise in automating ML workflows, enhancing model reproducibility...  ..., model training, evaluation, and deployment. Excellent... 
    Contract work
    Remote work
    Day shift

    MILLENNIUMSOFT

    San Diego, CA
    4 days ago
  •  ...non-technical stakeholders.Collaborate closely with team members to design and execute end-to-end data projects, from ideation to delivery.Continuously improve analytical methodologies and automation processes to enhance data workflows.Communicate effectively in both written... 
    Remote work

    Micro1

    Bonita, CA
    4 days ago
  •  ...learning and statistical models • Design scalable data pipelines using...  ...PyTorch • Mentor junior engineers and lead code reviews, best...  ...big data, streaming AI/ML training and prediction pipelines...  ...LSTM, hybrid models • Model evaluation, cross validation, hyper... 

    Omni Inclusive

    San Diego, CA
    2 days ago
  • $46.3 - $69.46 per hour

     ...accounting operations through intelligent automation. This role combines deep technical...  ...adoption across the organization. You will design and deploy AI-enabled solutions, including...  ...operations. Design, build, and deploy AI/ML business solutions (including LLMs and AI... 
    Full time
    Work experience placement
    Work from home

    Qualcomm

    San Diego, CA
    2 days ago
  •  ...conduct deep analyses, build customer segments, measure attribution, design experiments, and answer ambiguous, high-impact marketing...  ...with our marketing and growth teams to shape marketing strategy, evaluate channel performance, and ensure we are reaching and converting... 
    Full time
    Casual work
    H1b
    Worldwide
    Relocation package
    Flexible hours

    Art of Problem Solving

    San Diego, CA
    1 day ago
  •  ...talent networkJoin our Machine Learning Engineer Expert Network to connect with leading AI...  ...basis.About Mercor projectsTraining and evaluating AI models in Machine Learning EngineeringCreating...  ...learning model development, Python & ML frameworks (PyTorch / TensorFlow), model... 
    Contract work
    Remote work

    Mercor Inc

    National City, CA
    1 day ago
  • $158.4k - $237.6k

     ..., Inc. Job Area: Engineering Group, Engineering Group...  ...optimizing and deploying ML models - especially for edge...  ...You'll Do Design, develop, and maintain...  ...Develop and maintain automated quantization pipelines and evaluation harnesses to scale model... 
    Work experience placement
    Immediate start
    Work from home

    Qualcomm

    San Diego, CA
    8 hours ago
  • $264k - $330k

     ...assistance to power real automation and decision‑making....  ...Principal Machine Learning Engineer to help define and lead...  ...Intelligent AI Agents: Design and deploy advanced AI...  ...for end‑to‑end ML systems: data collection, model training, evaluation, deployment, and inference... 

    AppFolio, Inc

    San Diego, CA
    5 days ago
  • $178.4k - $267.6k

     ...Technologies, Inc. Job Area: Engineering Group, Engineering...  ...and tools. Architect, design, develop and test model...  ...in AI and general ML techniques. Proven hands-on experience evaluating and optimizing Generative...  ...practices (code review, CI/CD, automation, etc.) is a plus. EEO... 
    Work experience placement
    Work from home

    Stryker Corporation

    San Diego, CA
    2 days ago
  •  ...company based in San Diego is seeking a motivated Software Engineer for Sensing & Connectivity. In this role, you will...  ...learning to enhance user experiences. Responsibilities include designing and evaluating ML systems and analyzing behavioral patterns from sensor data... 

    Apple Inc.

    San Diego, CA
    1 day ago

Do you want to receive more vacancies?

Subscribe and receive similar vacancies to ML Engineer - Automated Evaluation and Adversarial Design. Be the first to apply!