Sign up to access all features of our service.
  • Job search
  • Favorites
  • Create a CV
    New
  • Salaries
  • Subscriptions

World Model Evaluation Lead

Archetype AI

Evaluation Lead

Archetype AI is seeking a hands-on Evaluation Lead to build and assess model performance for physical AI. You will design and implement advanced evaluation techniques for assessing the strengths and weaknesses of real-world AI models, and build and scale evaluation frameworks to rapidly test and generate reports on model performance. Responsibilities include partnering closely with research and engineering teams to develop evaluation methodologies, analytically assessing and improving test datasets, uncovering model weaknesses or risks, and tracking competitive industry benchmarks. This is a high-impact role for someone who thrives in a fast-paced AI environment and wants to directly influence our path as we scale our AI technologies and business.

Core Responsibilities:

  • Drive Benchmarking & Evaluation
  • Design and implement rigorous evaluation methodologies and benchmarks for measuring model effectiveness, reliability, alignment, and safety
  • Lead evaluation of model performance, ranging from offline experiments to full production model testing
  • Build & Scale Evaluation Frameworks
  • Design and oversee the pipelines, dashboards, and tools that automate model evaluation
  • Design and oversee tools for A/B model testing, regression testing, and production model performance
  • Lead Evaluation Strategy
  • Develop and implement strategies for evaluating physical AI models that can scale across a broad range of real-world use cases, sensor types, and edge cases
  • Plan, run, and oversee evaluations, across internal teams and external customers
  • Drive edge case discovery, red-teaming, safety, privacy, and risk evaluation - feeding back knowledge to key stakeholders in research and engineering teams

Key Requirements:

  • Extensive expertise in evaluating AI and machine learning models, ideally in physical AI or a related AI field
  • Experience in designing, implementing, and refining evaluation metrics
  • Deep understanding of machine learning, AI, and generative models
  • Excellent python and software engineering skills
  • Experience designing and building scaleable data pipelines and evaluation tools
  • Experience collaborating closely with key stakeholders from research, engineering, and product teams
  • Strong communication and documentation skills, with a bias for creating detailed evaluation reports that help drive model performance
  • Startup-ready mindset with the ability to thrive in high-velocity, high-ambiguity environments

Minimum Qualifications:

  • Extensive expertise in evaluating AI and machine learning models, ideally in physical AI or a related AI field
  • Experience in designing, implementing, and refining evaluation metrics
  • Deep understanding of machine learning, AI, and generative models
  • Excellent python and software engineering skills
  • Experience designing and building scaleable data pipelines and evaluation tools
  • Experience collaborating closely with key stakeholders from research, engineering, and product teams
  • Strong communication and documentation skills, with a bias for creating detailed evaluation reports that help drive model performance
  • Startup-ready mindset with the ability to thrive in high-velocity, high-ambiguity environments

What We Would Love To See:

  • Experience evaluating real-world, real-time algorithms
  • Experience evaluating a broad range of sensor types, such as cameras, LIDAR, physical sensors, RF sensors, and beyond
  • A strong scientific approach to evaluation and understanding model performance
  • Experience in evaluating production algorithms
  • Experience building and curating data campaigns to create extensive test datasets
  • Experience managing internal teams and/or external vendors
Vacancy posted 2 days ago
Similar jobs that could be interesting for youBased on the World Model Evaluation Lead in United States vacancy
  •  ...Anthropic is seeking a Research Lead for the Training Insights team to shape the evaluation of model capabilities. This hands-on leadership role involves developing innovative evaluation methodologies and mentoring a team of researchers. You will play a crucial role in... 
    Suggested
    Remote work

    Anthropic

    San Francisco, CA
    2 days ago
  •  ...We are seeking an expert to evaluate and improve our AI models through comprehensive testing and analysis. You will be responsible for designing evaluation frameworks, conducting model assessments, and providing actionable insights for model improvement. Key Responsibilities... 
    Suggested

    MERIT Beauty

    New York, NY
    3 days ago
  • $300k

    A leading research institute in Sunnyvale is seeking a visionary who will lead world modeling efforts and manage a multidisciplinary team. This role involves designing innovative simulators based on cutting-edge research and requires a Ph.D. or M.S. with substantial experience... 
    Suggested

    Institute of Foundation Models

    Sunnyvale, CA
    4 days ago
  •  ...AI Models Lead Location- Ruddington, Nottinghamshire Level - Team Leader/Professional...  ...ownership of how language models are adapted, evaluated, and promoted into production across EHS...  ...and preference optimization to real-world problems Strong tooling fluency with... 
    Suggested
    Remote work
    Flexible hours

    Ideagen

    United States
    2 days ago
  •  ...Quant Model Risk Vice President Bring your expertise to JPMorganChase. As part of Risk...  ...your expert judgement to solve real-world challenges that impact our company, customers...  ...for specific products and structures. Evaluate model behavior and ensure the suitability... 
    Suggested

    Chase

    Jersey City, NJ
    2 days ago
  • $210k - $235k

     ...groundbreaking solutions for the world’s most complex health challenges....  ...technical strategy for large‑scale AI/ML models, including LLMs and multimodal systems. Lead development, training, and...  ...MLOps pipelines, rigorous evaluation frameworks, and responsible AI practices... 
    Immediate start
    Night shift

    Cepheid

    Austin, TX
    1 day ago
  • Twelve-Labs in San Francisco is seeking a dedicated member for our ML Data Team to lead video data preparation and evaluation. This role includes defining dataset needs, automating processes, and enhancing data quality through collaboration. Ideal candidates should have... 
    Flexible hours

    Twelve-Labs

    San Francisco, CA
    2 days ago
  •  ...research lab dedicated to building foundation models for environments that require deep...  ...agents capable of navigating space and time, world models that provide training environments...  ...learning) and the ability to design and evaluate policy networks. * Programming fluency... 
    Work at office

    Medal

    New York, NY
    2 days ago
  •  ...as our ability to measure it. At Sanas, model quality spans dimensions that automated metrics...  ..., translation fluency under real-world disfluency. We’re looking for a Research...  ...all of Sanas’s model families, build the evaluation infrastructure to measure it rigorously,... 

    Sanas

    Palo Alto, CA
    31 minutes ago
  • A cutting-edge AI technology firm in San Francisco is seeking an Evaluation Lead to drive the assessment of AI model performance. You will design evaluation methodologies, automate evaluation processes, and oversee various evaluation strategies. The ideal candidate has... 

    SupportFinity™

    San Francisco, CA
    2 days ago
  • $240.45k - $300.3k

     ...Senior Machine Learning Engineer - Model Evaluations, Public Sector The Public Sector ML team...  ...reliably, safely, and effectively under real-world constraints. As an ML Engineer, you...  ...technologies that power the world's leading models, and help enterprises and governments... 
    Full time

    Scale AI

    Washington DC
    7 days ago
  • $224k - $356.5k

     ...computers, robots, and self-driving cars that can understand the world. Doing what’s never been done before takes vision, innovation...  ...computing. As a Senior / Principal Deep Learning Engineer — Model Evaluation & AI Systems, you will play a meaningful role in crafting the... 

    NVIDIA Gruppe

    Santa Clara, CA
    24 minutes ago
  •  ...our Physicist Expert Network to connect with leading AI labs and companies seeking your expertise...  ...from 15–30 hours per week.Training and evaluating AI models in physicsCreating tasks and deliverables based on real-world scenariosProviding domain-specific feedback... 
    Contract work
    Remote work

    Mercor Inc

    Aurora, IL
    4 days ago
  •  ...the public sector around the world. Our core work consists of: Creating...  ...of AI As a Production AI Ops Lead, you will design and develop...  ...automated systems to monitor model performance and data drift...  ...to ensure a fair and thorough evaluation of all applicants. We are proud... 

    AI Chopping Block, Inc.

    New York, NY
    52 minutes ago
  • $220k - $320k

     ...About the Institute of Foundation Models We are a dedicated research lab for building...  ...edge foundation model training, alongside world-class researchers, data scientists, and engineers...  ..., unblocking data pipelines, scaling our evaluation frameworks, and ensuring our researchers... 
    Visa sponsorship
    Flexible hours

    Institute of Foundation Models

    Sunnyvale, CA
    2 days ago
  • $143.32k - $273.93k

     ...policy and business needs. The Opportunity The Fraud Model Management Lead will be a senior team member, working with other staff and...  ...policies and procedures at a subject matter expert level. Evaluates model risk control strengths around model development,... 
    Work experience placement
    H1b
    Work at office
    Remote work
    Relocation package
    Flexible hours

    USAA

    San Antonio, TX
    1 day ago
  •  ...cutting-edge multimodal foundation models that have the ability to...  ...of technology to transform the world. Join us as we revolutionize video...  ...of our ML Data Team - which leads the full spectrum of video-...  ...language data preparation and model evaluation. This role comes with high... 
    Work at office
    Worldwide
    Flexible hours

    Twelve Labs, Inc

    San Francisco, CA
    23 hours ago
  •  ...risk posture, inclusive of Responsible AI, model risk management, AI security, privacy,...  ...Model Risk Management & Lifecycle Oversight Lead model risk assessments, validation,...  ...explainability, performance monitoring, evaluation, and retraining triggers. Partner with audit... 
    Work experience placement
    Immediate start
    Remote work

    CareFirst BlueCross BlueShield

    Baltimore, MD
    4 days ago
  • $180k - $270k

    About Plaud Inc. Plaud is building the world’s most trusted AI work companion for professionals to elevate...  ...reliable distributed systems, data pipelines, or evaluation harnesses that can run at scale against live model checkpoints. Can deeply partner with ML researchers... 
    Full time
    Work at office
    Worldwide

    Plaud

    San Francisco, CA
    2 days ago
  • $150k

    About the Institute of Foundation Models We are a dedicated research lab for building, understanding...  ...foundation model training, alongside world‑class researchers, data scientists, and...  ...data pipelines, experimentation, and evaluation workflows. This role balances fast‑moving... 
    Visa sponsorship

    Institute of Foundation Models

    Sunnyvale, CA
    23 hours ago
  • $150k - $180k

     ...Faraday Future is seeking a leader in Robotics AI Model to manage the pipeline transforming pretrained models into deployable robot policies...  ..., integrating multi-modal sensory data, and optimizing for real-world applications. Candidates should have a Master’s or PhD in... 

    Faraday Future

    El Segundo, CA
    26 minutes ago
  •  ...our Physicist Expert Network to connect with leading AI labs and companies seeking your expertise...  ...in our network contribute to Training and evaluating AI models in physics Creating tasks and deliverables based on real-world scenarios Providing domain‑specific feedback... 
    Contract work
    Remote work

    Mercor Inc

    Springfield, OR
    1 day ago
  •  ...Python Infrastructure Engineer - Model Evaluation (AI Training) About the Role What if...  ...expertise could directly shape how the world's most advanced AI models are built, tested...  ...tooling, and evaluation systems that leading AI labs depend on to train and validate... 
    Hourly pay
    Ongoing contract
    Contract work
    Freelance
    Remote work
    Flexible hours

    Alignerr

    Seattle, WA
    3 days ago
  •  ...Summary Kent Companies’ BIM Manager leads Building Information Modeling (BIM) and digital delivery, ensuring project teams and trade partners have...  ...Technology, vendor, and licensing support: Assist with evaluating and piloting BIM-related tools; help manage software licenses... 
    Weekly pay
    Full time
    For contractors
    Work at office
    Local area
    Remote work

    Kent Companies, Inc.

    Lewisville, TX
    37 minutes ago
  • $300k

     ...the Institute of Foundation Models We are a dedicated research lab...  ...of our general-purpose world modeling efforts. You’ll translate...  ...Establish performance, safety and evaluation benchmarks, driving...  ...Preferred Qualifications Experience leading multi‑location technical... 
    Visa sponsorship

    Institute of Foundation Models

    Sunnyvale, CA
    4 days ago
  •  ...individual will manage a small team of modelers responsible for designing, developing, and...  ...visibility.Global Risk Solutions is a leading provider of commercial, specialty insurance...  ...reflect the businessDevelop tools to evaluate and optimize the GRS reinsurance programProvide... 
    Local area
    Worldwide

    Liberty Mutual Insurance

    Boston, MA
    1 day ago
  •  ...crafting press releases and coordinating with various teams. The ideal candidate will have over 5 years in B2B tech PR, possess strong communication skills, and be adept at project management. Join us and be part of a diverse team reshaping the world! #J-18808-Ljbffr NVIDIA

    NVIDIA

    Santa Clara, CA
    3 days ago
  • $133.28k - $250k

    Overview The Distribution Operating Model & Roles Lead is accountable for defining, activating, and sustaining how work gets done across Citizens...  ...to ensure operating model decisions are informed by real‑world execution and control requirements. Incorporate lessons from... 
    H1b
    Work at office
    Local area
    Remote work
    Work visa
    Monday to Friday
    Flexible hours

    Citizens Bank

    Johnston, RI
    23 hours ago
  • $143.58k - $194.26k

    CECL Model Development & Implementation Lead The CECL Model Development & Implementation Lead leads Current Expected Credit Losses (CECL) model development...  ...Python, SAS, and SQL Experience in model performance evaluation and back-testing Strong expertise in CECL / ACL... 
    Work experience placement

    EverBank

    Sacramento, CA
    1 day ago
  •  ...driving and robotic generalist. We have a group composed of leading experts from top institutions and companies, recognized...  ...Intuition, you will: Conduct research on pretraining world-action foundation model with various world modalities including vision and physics... 
    For contractors
    For subcontractor
    Casual work
    Internship
    Work at office
    Immediate start
    Remote work
    Day shift

    Applied Intuition

    Sunnyvale, CA
    4 days ago

Do you want to receive more vacancies?

Subscribe and receive similar vacancies to World Model Evaluation Lead. Be the first to apply!