World Model Evaluation Lead
Archetype AI
Evaluation Lead
Archetype AI is seeking a hands-on Evaluation Lead to build and assess model performance for physical AI. You will design and implement advanced evaluation techniques for assessing the strengths and weaknesses of real-world AI models, and build and scale evaluation frameworks to rapidly test and generate reports on model performance. Responsibilities include partnering closely with research and engineering teams to develop evaluation methodologies, analytically assessing and improving test datasets, uncovering model weaknesses or risks, and tracking competitive industry benchmarks. This is a high-impact role for someone who thrives in a fast-paced AI environment and wants to directly influence our path as we scale our AI technologies and business.
Core Responsibilities:
- Drive Benchmarking & Evaluation
- Design and implement rigorous evaluation methodologies and benchmarks for measuring model effectiveness, reliability, alignment, and safety
- Lead evaluation of model performance, ranging from offline experiments to full production model testing
- Build & Scale Evaluation Frameworks
- Design and oversee the pipelines, dashboards, and tools that automate model evaluation
- Design and oversee tools for A/B model testing, regression testing, and production model performance
- Lead Evaluation Strategy
- Develop and implement strategies for evaluating physical AI models that can scale across a broad range of real-world use cases, sensor types, and edge cases
- Plan, run, and oversee evaluations, across internal teams and external customers
- Drive edge case discovery, red-teaming, safety, privacy, and risk evaluation - feeding back knowledge to key stakeholders in research and engineering teams
Key Requirements:
- Extensive expertise in evaluating AI and machine learning models, ideally in physical AI or a related AI field
- Experience in designing, implementing, and refining evaluation metrics
- Deep understanding of machine learning, AI, and generative models
- Excellent python and software engineering skills
- Experience designing and building scaleable data pipelines and evaluation tools
- Experience collaborating closely with key stakeholders from research, engineering, and product teams
- Strong communication and documentation skills, with a bias for creating detailed evaluation reports that help drive model performance
- Startup-ready mindset with the ability to thrive in high-velocity, high-ambiguity environments
Minimum Qualifications:
- Extensive expertise in evaluating AI and machine learning models, ideally in physical AI or a related AI field
- Experience in designing, implementing, and refining evaluation metrics
- Deep understanding of machine learning, AI, and generative models
- Excellent python and software engineering skills
- Experience designing and building scaleable data pipelines and evaluation tools
- Experience collaborating closely with key stakeholders from research, engineering, and product teams
- Strong communication and documentation skills, with a bias for creating detailed evaluation reports that help drive model performance
- Startup-ready mindset with the ability to thrive in high-velocity, high-ambiguity environments
What We Would Love To See:
- Experience evaluating real-world, real-time algorithms
- Experience evaluating a broad range of sensor types, such as cameras, LIDAR, physical sensors, RF sensors, and beyond
- A strong scientific approach to evaluation and understanding model performance
- Experience in evaluating production algorithms
- Experience building and curating data campaigns to create extensive test datasets
- Experience managing internal teams and/or external vendors
- ...Anthropic is seeking a Research Lead for the Training Insights team to shape the evaluation of model capabilities. This hands-on leadership role involves developing innovative evaluation methodologies and mentoring a team of researchers. You will play a crucial role in...SuggestedRemote work
- ...We are seeking an expert to evaluate and improve our AI models through comprehensive testing and analysis. You will be responsible for designing evaluation frameworks, conducting model assessments, and providing actionable insights for model improvement. Key Responsibilities...Suggested
$300k
A leading research institute in Sunnyvale is seeking a visionary who will lead world modeling efforts and manage a multidisciplinary team. This role involves designing innovative simulators based on cutting-edge research and requires a Ph.D. or M.S. with substantial experience...Suggested- ...AI Models Lead Location- Ruddington, Nottinghamshire Level - Team Leader/Professional... ...ownership of how language models are adapted, evaluated, and promoted into production across EHS... ...and preference optimization to real-world problems Strong tooling fluency with...SuggestedRemote workFlexible hours
- ...Quant Model Risk Vice President Bring your expertise to JPMorganChase. As part of Risk... ...your expert judgement to solve real-world challenges that impact our company, customers... ...for specific products and structures. Evaluate model behavior and ensure the suitability...Suggested
$210k - $235k
...groundbreaking solutions for the world’s most complex health challenges.... ...technical strategy for large‑scale AI/ML models, including LLMs and multimodal systems. Lead development, training, and... ...MLOps pipelines, rigorous evaluation frameworks, and responsible AI practices...Immediate startNight shift- Twelve-Labs in San Francisco is seeking a dedicated member for our ML Data Team to lead video data preparation and evaluation. This role includes defining dataset needs, automating processes, and enhancing data quality through collaboration. Ideal candidates should have...Flexible hours
- ...research lab dedicated to building foundation models for environments that require deep... ...agents capable of navigating space and time, world models that provide training environments... ...learning) and the ability to design and evaluate policy networks. * Programming fluency...Work at office
- ...as our ability to measure it. At Sanas, model quality spans dimensions that automated metrics... ..., translation fluency under real-world disfluency. We’re looking for a Research... ...all of Sanas’s model families, build the evaluation infrastructure to measure it rigorously,...
- A cutting-edge AI technology firm in San Francisco is seeking an Evaluation Lead to drive the assessment of AI model performance. You will design evaluation methodologies, automate evaluation processes, and oversee various evaluation strategies. The ideal candidate has...
$240.45k - $300.3k
...Senior Machine Learning Engineer - Model Evaluations, Public Sector The Public Sector ML team... ...reliably, safely, and effectively under real-world constraints. As an ML Engineer, you... ...technologies that power the world's leading models, and help enterprises and governments...Full time$224k - $356.5k
...computers, robots, and self-driving cars that can understand the world. Doing what’s never been done before takes vision, innovation... ...computing. As a Senior / Principal Deep Learning Engineer — Model Evaluation & AI Systems, you will play a meaningful role in crafting the...- ...our Physicist Expert Network to connect with leading AI labs and companies seeking your expertise... ...from 15–30 hours per week.Training and evaluating AI models in physicsCreating tasks and deliverables based on real-world scenariosProviding domain-specific feedback...Contract workRemote work
- ...the public sector around the world. Our core work consists of: Creating... ...of AI As a Production AI Ops Lead, you will design and develop... ...automated systems to monitor model performance and data drift... ...to ensure a fair and thorough evaluation of all applicants. We are proud...
$220k - $320k
...About the Institute of Foundation Models We are a dedicated research lab for building... ...edge foundation model training, alongside world-class researchers, data scientists, and engineers... ..., unblocking data pipelines, scaling our evaluation frameworks, and ensuring our researchers...Visa sponsorshipFlexible hours$143.32k - $273.93k
...policy and business needs. The Opportunity The Fraud Model Management Lead will be a senior team member, working with other staff and... ...policies and procedures at a subject matter expert level. Evaluates model risk control strengths around model development,...Work experience placementH1bWork at officeRemote workRelocation packageFlexible hours- ...cutting-edge multimodal foundation models that have the ability to... ...of technology to transform the world. Join us as we revolutionize video... ...of our ML Data Team - which leads the full spectrum of video-... ...language data preparation and model evaluation. This role comes with high...Work at officeWorldwideFlexible hours
- ...risk posture, inclusive of Responsible AI, model risk management, AI security, privacy,... ...Model Risk Management & Lifecycle Oversight Lead model risk assessments, validation,... ...explainability, performance monitoring, evaluation, and retraining triggers. Partner with audit...Work experience placementImmediate startRemote work
$180k - $270k
About Plaud Inc. Plaud is building the world’s most trusted AI work companion for professionals to elevate... ...reliable distributed systems, data pipelines, or evaluation harnesses that can run at scale against live model checkpoints. Can deeply partner with ML researchers...Full timeWork at officeWorldwide$150k
About the Institute of Foundation Models We are a dedicated research lab for building, understanding... ...foundation model training, alongside world‑class researchers, data scientists, and... ...data pipelines, experimentation, and evaluation workflows. This role balances fast‑moving...Visa sponsorship$150k - $180k
...Faraday Future is seeking a leader in Robotics AI Model to manage the pipeline transforming pretrained models into deployable robot policies... ..., integrating multi-modal sensory data, and optimizing for real-world applications. Candidates should have a Master’s or PhD in...- ...our Physicist Expert Network to connect with leading AI labs and companies seeking your expertise... ...in our network contribute to Training and evaluating AI models in physics Creating tasks and deliverables based on real-world scenarios Providing domain‑specific feedback...Contract workRemote work
- ...Python Infrastructure Engineer - Model Evaluation (AI Training) About the Role What if... ...expertise could directly shape how the world's most advanced AI models are built, tested... ...tooling, and evaluation systems that leading AI labs depend on to train and validate...Hourly payOngoing contractContract workFreelanceRemote workFlexible hours
- ...Summary Kent Companies’ BIM Manager leads Building Information Modeling (BIM) and digital delivery, ensuring project teams and trade partners have... ...Technology, vendor, and licensing support: Assist with evaluating and piloting BIM-related tools; help manage software licenses...Weekly payFull timeFor contractorsWork at officeLocal areaRemote work
$300k
...the Institute of Foundation Models We are a dedicated research lab... ...of our general-purpose world modeling efforts. You’ll translate... ...Establish performance, safety and evaluation benchmarks, driving... ...Preferred Qualifications Experience leading multi‑location technical...Visa sponsorship- ...individual will manage a small team of modelers responsible for designing, developing, and... ...visibility.Global Risk Solutions is a leading provider of commercial, specialty insurance... ...reflect the businessDevelop tools to evaluate and optimize the GRS reinsurance programProvide...Local areaWorldwide
- ...crafting press releases and coordinating with various teams. The ideal candidate will have over 5 years in B2B tech PR, possess strong communication skills, and be adept at project management. Join us and be part of a diverse team reshaping the world! #J-18808-Ljbffr NVIDIA
$133.28k - $250k
Overview The Distribution Operating Model & Roles Lead is accountable for defining, activating, and sustaining how work gets done across Citizens... ...to ensure operating model decisions are informed by real‑world execution and control requirements. Incorporate lessons from...H1bWork at officeLocal areaRemote workWork visaMonday to FridayFlexible hours$143.58k - $194.26k
CECL Model Development & Implementation Lead The CECL Model Development & Implementation Lead leads Current Expected Credit Losses (CECL) model development... ...Python, SAS, and SQL Experience in model performance evaluation and back-testing Strong expertise in CECL / ACL...Work experience placement- ...driving and robotic generalist. We have a group composed of leading experts from top institutions and companies, recognized... ...Intuition, you will: Conduct research on pretraining world-action foundation model with various world modalities including vision and physics...For contractorsFor subcontractorCasual workInternshipWork at officeImmediate startRemote workDay shift
Do you want to receive more vacancies?
Subscribe and receive similar vacancies to World Model Evaluation Lead. Be the first to apply!

