World Model Evaluation Lead

Archetype AI

Evaluation Lead

Archetype AI is seeking a hands-on Evaluation Lead to build and assess model performance for physical AI. You will design and implement advanced evaluation techniques for assessing the strengths and weaknesses of real-world AI models, and build and scale evaluation frameworks to rapidly test and generate reports on model performance. Responsibilities include partnering closely with research and engineering teams to develop evaluation methodologies, analytically assessing and improving test datasets, uncovering model weaknesses or risks, and tracking competitive industry benchmarks. This is a high-impact role for someone who thrives in a fast-paced AI environment and wants to directly influence our path as we scale our AI technologies and business.

Core Responsibilities:

Drive Benchmarking & Evaluation
Design and implement rigorous evaluation methodologies and benchmarks for measuring model effectiveness, reliability, alignment, and safety
Lead evaluation of model performance, ranging from offline experiments to full production model testing
Build & Scale Evaluation Frameworks
Design and oversee the pipelines, dashboards, and tools that automate model evaluation
Design and oversee tools for A/B model testing, regression testing, and production model performance
Lead Evaluation Strategy
Develop and implement strategies for evaluating physical AI models that can scale across a broad range of real-world use cases, sensor types, and edge cases
Plan, run, and oversee evaluations, across internal teams and external customers
Drive edge case discovery, red-teaming, safety, privacy, and risk evaluation - feeding back knowledge to key stakeholders in research and engineering teams

Key Requirements:

Extensive expertise in evaluating AI and machine learning models, ideally in physical AI or a related AI field
Experience in designing, implementing, and refining evaluation metrics
Deep understanding of machine learning, AI, and generative models
Excellent python and software engineering skills
Experience designing and building scaleable data pipelines and evaluation tools
Experience collaborating closely with key stakeholders from research, engineering, and product teams
Strong communication and documentation skills, with a bias for creating detailed evaluation reports that help drive model performance
Startup-ready mindset with the ability to thrive in high-velocity, high-ambiguity environments

Minimum Qualifications:

Extensive expertise in evaluating AI and machine learning models, ideally in physical AI or a related AI field
Experience in designing, implementing, and refining evaluation metrics
Deep understanding of machine learning, AI, and generative models
Excellent python and software engineering skills
Experience designing and building scaleable data pipelines and evaluation tools
Experience collaborating closely with key stakeholders from research, engineering, and product teams
Strong communication and documentation skills, with a bias for creating detailed evaluation reports that help drive model performance
Startup-ready mindset with the ability to thrive in high-velocity, high-ambiguity environments

What We Would Love To See:

Experience evaluating real-world, real-time algorithms
Experience evaluating a broad range of sensor types, such as cameras, LIDAR, physical sensors, RF sensors, and beyond
A strong scientific approach to evaluation and understanding model performance
Experience in evaluating production algorithms
Experience building and curating data campaigns to create extensive test datasets
Experience managing internal teams and/or external vendors

Apply

Vacancy posted 2 days ago

Similar jobs that could be interesting for youBased on the World Model Evaluation Lead in United States vacancy

Research Lead, Model Evaluation & Training Insights
...Anthropic is seeking a Research Lead for the Training Insights team to shape the evaluation of model capabilities. This hands-on leadership role involves developing innovative evaluation methodologies and mentoring a team of researchers. You will play a crucial role in...
Suggested
Remote work
Anthropic
San Francisco, CA
2 days ago
AI Model Evaluation Lead: Metrics, Bias & Fairness
...We are seeking an expert to evaluate and improve our AI models through comprehensive testing and analysis. You will be responsible for designing evaluation frameworks, conducting model assessments, and providing actionable insights for model improvement. Key Responsibilities...
Suggested
MERIT Beauty
New York, NY
3 days ago
World Model Lead & Simulation Architect
$300k
A leading research institute in Sunnyvale is seeking a visionary who will lead world modeling efforts and manage a multidisciplinary team. This role involves designing innovative simulators based on cutting-edge research and requires a Ph.D. or M.S. with substantial experience...
Suggested
Institute of Foundation Models
Sunnyvale, CA
4 days ago
AI Model Lead
...AI Models Lead Location- Ruddington, Nottinghamshire Level - Team Leader/Professional... ...ownership of how language models are adapted, evaluated, and promoted into production across EHS... ...and preference optimization to real-world problems Strong tooling fluency with...
Suggested
Remote work
Flexible hours
Ideagen
United States
2 days ago
Model Risk - Quant Modeling Lead - Vice President
...Quant Model Risk Vice President Bring your expertise to JPMorganChase. As part of Risk... ...your expert judgement to solve real-world challenges that impact our company, customers... ...for specific products and structures. Evaluate model behavior and ensure the suitability...
Suggested
Chase
Jersey City, NJ
2 days ago
AI Model Lead
$210k - $235k
...groundbreaking solutions for the world’s most complex health challenges.... ...technical strategy for large‑scale AI/ML models, including LLMs and multimodal systems. Lead development, training, and... ...MLOps pipelines, rigorous evaluation frameworks, and responsible AI practices...
Immediate start
Night shift
Cepheid
Austin, TX
1 day ago
AI Model Evaluation Leader — Data Quality
Twelve-Labs in San Francisco is seeking a dedicated member for our ML Data Team to lead video data preparation and evaluation. This role includes defining dataset needs, automating processes, and enhancing data quality through collaboration. Ideal candidates should have...
Flexible hours
Twelve-Labs
San Francisco, CA
2 days ago
World Model / Action Policy Researcher
...research lab dedicated to building foundation models for environments that require deep... ...agents capable of navigating space and time, world models that provide training environments... ...learning) and the ability to design and evaluate policy networks. * Programming fluency...
Work at office
Medal
New York, NY
2 days ago
Research Scientist (Model Evaluation)
...as our ability to measure it. At Sanas, model quality spans dimensions that automated metrics... ..., translation fluency under real-world disfluency. We’re looking for a Research... ...all of Sanas’s model families, build the evaluation infrastructure to measure it rigorously,...
Sanas
Palo Alto, CA
31 minutes ago
AI Evaluation Lead: Real-World Systems Benchmarking
A cutting-edge AI technology firm in San Francisco is seeking an Evaluation Lead to drive the assessment of AI model performance. You will design evaluation methodologies, automate evaluation processes, and oversee various evaluation strategies. The ideal candidate has...
SupportFinity™
San Francisco, CA
2 days ago
Senior Machine Learning Engineer - Model Evaluations, Public Sector
$240.45k - $300.3k
...Senior Machine Learning Engineer - Model Evaluations, Public Sector The Public Sector ML team... ...reliably, safely, and effectively under real-world constraints. As an ML Engineer, you... ...technologies that power the world's leading models, and help enterprises and governments...
Full time
Scale AI
Washington DC
7 days ago
Senior Deep Learning Engineer - Model Evaluation & AI Systems
$224k - $356.5k
...computers, robots, and self-driving cars that can understand the world. Doing what’s never been done before takes vision, innovation... ...computing. As a Senior / Principal Deep Learning Engineer — Model Evaluation & AI Systems, you will play a meaningful role in crafting the...
NVIDIA Gruppe
Santa Clara, CA
24 minutes ago
Remote Physicist for AI Model Training & Evaluation
...our Physicist Expert Network to connect with leading AI labs and companies seeking your expertise... ...from 15–30 hours per week.Training and evaluating AI models in physicsCreating tasks and deliverables based on real-world scenariosProviding domain-specific feedback...
Contract work
Remote work
Mercor Inc
Aurora, IL
4 days ago
Senior Machine Learning Engineer - Model Evaluations, Public Sector
...the public sector around the world. Our core work consists of: Creating... ...of AI As a Production AI Ops Lead, you will design and develop... ...automated systems to monitor model performance and data drift... ...to ensure a fair and thorough evaluation of all applicants. We are proud...
AI Chopping Block, Inc.
New York, NY
52 minutes ago
Technical Program Manager - World Model
$220k - $320k
...About the Institute of Foundation Models We are a dedicated research lab for building... ...edge foundation model training, alongside world-class researchers, data scientists, and engineers... ..., unblocking data pipelines, scaling our evaluation frameworks, and ensuring our researchers...
Visa sponsorship
Flexible hours
Institute of Foundation Models
Sunnyvale, CA
2 days ago
Lead Model Validator - Fraud
$143.32k - $273.93k
...policy and business needs. The Opportunity The Fraud Model Management Lead will be a senior team member, working with other staff and... ...policies and procedures at a subject matter expert level. Evaluates model risk control strengths around model development,...
Work experience placement
H1b
Work at office
Remote work
Relocation package
Flexible hours
USAA
San Antonio, TX
1 day ago
Model Evaluation & Data Quality Lead
...cutting-edge multimodal foundation models that have the ability to... ...of technology to transform the world. Join us as we revolutionize video... ...of our ML Data Team - which leads the full spectrum of video-... ...language data preparation and model evaluation. This role comes with high...
Work at office
Worldwide
Flexible hours
Twelve Labs, Inc
San Francisco, CA
23 hours ago
Lead, Responsible AI, Security, and Model Risk (Remote)
...risk posture, inclusive of Responsible AI, model risk management, AI security, privacy,... ...Model Risk Management & Lifecycle Oversight Lead model risk assessments, validation,... ...explainability, performance monitoring, evaluation, and retraining triggers. Partner with audit...
Work experience placement
Immediate start
Remote work
CareFirst BlueCross BlueShield
Baltimore, MD
4 days ago
Machine Learning Engineer, Model Evaluations (Speech LLM) - San Francisco
$180k - $270k
About Plaud Inc. Plaud is building the world’s most trusted AI work companion for professionals to elevate... ...reliable distributed systems, data pipelines, or evaluation harnesses that can run at scale against live model checkpoints. Can deeply partner with ML researchers...
Full time
Work at office
Worldwide
Plaud
San Francisco, CA
2 days ago
Machine Learning Engineer - World Model
$150k
About the Institute of Foundation Models We are a dedicated research lab for building, understanding... ...foundation model training, alongside world‑class researchers, data scientists, and... ...data pipelines, experimentation, and evaluation workflows. This role balances fast‑moving...
Visa sponsorship
Institute of Foundation Models
Sunnyvale, CA
23 hours ago
Robotics AI Model Lead: From Lab to Real Robots
$150k - $180k
...Faraday Future is seeking a leader in Robotics AI Model to manage the pipeline transforming pretrained models into deployable robot policies... ..., integrating multi-modal sensory data, and optimizing for real-world applications. Candidates should have a Master’s or PhD in...
Faraday Future
El Segundo, CA
26 minutes ago
Remote Physicist AI Trainer & Model Evaluation Specialist
...our Physicist Expert Network to connect with leading AI labs and companies seeking your expertise... ...in our network contribute to Training and evaluating AI models in physics Creating tasks and deliverables based on real-world scenarios Providing domain‑specific feedback...
Contract work
Remote work
Mercor Inc
Springfield, OR
1 day ago
Python Insfrastructure Engineer - Model Evaluation
...Python Infrastructure Engineer - Model Evaluation (AI Training) About the Role What if... ...expertise could directly shape how the world's most advanced AI models are built, tested... ...tooling, and evaluation systems that leading AI labs depend on to train and validate...
Hourly pay
Ongoing contract
Contract work
Freelance
Remote work
Flexible hours
Alignerr
Seattle, WA
3 days ago
Remote BIM Manager Lead Model Delivery & Coordination
...Summary Kent Companies’ BIM Manager leads Building Information Modeling (BIM) and digital delivery, ensuring project teams and trade partners have... ...Technology, vendor, and licensing support: Assist with evaluating and piloting BIM-related tools; help manage software licenses...
Weekly pay
Full time
For contractors
Work at office
Local area
Remote work
Kent Companies, Inc.
Lewisville, TX
37 minutes ago
Staff Scientist - World Model
$300k
...the Institute of Foundation Models We are a dedicated research lab... ...of our general-purpose world modeling efforts. You’ll translate... ...Establish performance, safety and evaluation benchmarks, driving... ...Preferred Qualifications Experience leading multi‑location technical...
Visa sponsorship
Institute of Foundation Models
Sunnyvale, CA
4 days ago
UW Risk Model Lead-Commercial Lines Insurance
...individual will manage a small team of modelers responsible for designing, developing, and... ...visibility.Global Risk Solutions is a leading provider of commercial, specialty insurance... ...reflect the businessDevelop tools to evaluate and optimize the GRS reinsurance programProvide...
Local area
Worldwide
Liberty Mutual Insurance
Boston, MA
1 day ago
PR Lead, Cloud & Model Partnerships (Equity)
...crafting press releases and coordinating with various teams. The ideal candidate will have over 5 years in B2B tech PR, possess strong communication skills, and be adept at project management. Join us and be part of a diverse team reshaping the world! #J-18808-Ljbffr NVIDIA
NVIDIA
Santa Clara, CA
3 days ago
Distribution Operating Model & Roles Lead
$133.28k - $250k
Overview The Distribution Operating Model & Roles Lead is accountable for defining, activating, and sustaining how work gets done across Citizens... ...to ensure operating model decisions are informed by real‑world execution and control requirements. Incorporate lessons from...
H1b
Work at office
Local area
Remote work
Work visa
Monday to Friday
Flexible hours
Citizens Bank
Johnston, RI
23 hours ago
CECL Model Lead: Development, Deployment & Automation
$143.58k - $194.26k
CECL Model Development & Implementation Lead The CECL Model Development & Implementation Lead leads Current Expected Credit Losses (CECL) model development... ...Python, SAS, and SQL Experience in model performance evaluation and back-testing Strong expertise in CECL / ACL...
Work experience placement
EverBank
Sacramento, CA
1 day ago
Research Intern - World-Action Foundation Model, Robotics
...driving and robotic generalist. We have a group composed of leading experts from top institutions and companies, recognized... ...Intuition, you will: Conduct research on pretraining world-action foundation model with various world modalities including vision and physics...
For contractors
For subcontractor
Casual work
Internship
Work at office
Immediate start
Remote work
Day shift
Applied Intuition
Sunnyvale, CA
4 days ago

Do you want to receive more vacancies?

Subscribe and receive similar vacancies to World Model Evaluation Lead. Be the first to apply!