Sign up to access all features of our service.
  • Job search
  • Favorites
  • Create a CV
    New
  • Salaries
  • Subscriptions

Evaluation Lead

SupportFinity

About Archetype AI Archetype AI is developing the world's first AI platform to bring AI into the real world. Formed by an exceptionally high-caliber team from Google, Archetype AI is building a foundation model for the physical world, a real‑time multimodal LLM for real life, transforming real‑world data into valuable insights and knowledge that people will be able to interact with naturally. It will help people in their real lives, not just online, because it understands the real‑time physical environment and everything that happens in it. Supported by deep tech venture funds in Silicon Valley, Archetype AI is currently at the Series A stage and is progressing rapidly to develop technology for its next stage. This presents a unique and once‑in‑a‑lifetime opportunity to be part of an exciting AI team at the beginning of their journey, located in the heart of Silicon Valley. Our team is headquartered in San Mateo, California, with team members throughout the US and Europe. We are actively growing, so if you are an exceptional candidate excited to work on the cutting edge of physical AI and don’t see a role that exactly fits you below you can contact us directly with your resume via jobsarchetypeaiio. About The Role Archetype AI is seeking a hands‑on Evaluation Lead to build and assess model performance for physical AI. You will design and implement advanced evaluation techniques for assessing the strengths and weaknesses of real‑world AI models, and build and scale evaluation frameworks to rapidly test and generate reports on model performance. Responsibilities include partnering closely with research and engineering teams to develop evaluation methodologies, analytically assessing and improving test datasets, uncovering model weaknesses or risks, and tracking competitive industry benchmarks. This is a high‑impact role for someone who thrives in a fast‑paced AI environment and wants to directly influence our path as we scale our AI technologies and business. Core Responsibilities Drive Benchmarking & Evaluation Design and implement rigorous evaluation methodologies and benchmarks for measuring model effectiveness, reliability, alignment, and safety Lead evaluation of model performance, ranging from offline experiments to full production model testing Build & Scale Evaluation Frameworks Design and oversee the pipelines, dashboards, and tools that automate model evaluation Design and oversee tools for A/B model testing, regression testing, and production model performance Lead Evaluation Strategy Develop and implement strategies for evaluating physical AI models that can scale across a broad range of real‑world use cases, sensor types, and edge cases Plan, run, and oversee evaluations, across internal teams and external customers Drive edge case discovery, red‑teaming, safety, privacy, and risk evaluation – feeding back knowledge to key stakeholders in research and engineering teams Key Requirements Extensive expertise in evaluating AI and machine learning models, ideally in physical AI or a related AI field Experience in designing, implementing, and refining evaluation metrics Deep understanding of machine learning, AI, and generative models Excellent Python and software engineering skills Experience designing and building scalable data pipelines and evaluation tools Experience collaborating closely with key stakeholders from research, engineering, and product teams Strong communication and documentation skills, with a bias for creating detailed evaluation reports that help drive model performance Startup‑ready mindset with the ability to thrive in high‑velocity, high‑ambiguity environments Minimum Qualifications Extensive expertise in evaluating AI and machine learning models, ideally in physical AI or a related AI field Experience in designing, implementing, and refining evaluation metrics Deep understanding of machine learning, AI, and generative models Excellent Python and software engineering skills Experience designing and building scalable data pipelines and evaluation tools Experience collaborating closely with key stakeholders from research, engineering, and product teams Strong communication and documentation skills, with a bias for creating detailed evaluation reports that help drive model performance Startup‑ready mindset with the ability to thrive in high‑velocity, high‑ambiguity environments What We Would Love To See Experience evaluating real‑world, real‑time algorithms Experience evaluating a broad range of sensor types, such as cameras, LIDAR, physical sensors, RF sensors, and beyond A strong scientific approach to evaluation and understanding model performance Experience in evaluating production algorithms Experience building and curating data campaigns to create extensive test datasets Experience managing internal teams and/or external vendors #J-18808-Ljbffr

Vacancy posted 9 hours ago
Similar jobs that could be interesting for youBased on the Evaluation Lead in San Francisco, CA vacancy
  •  ...Clera is seeking a skilled individual to evaluate medical imaging AI systems, ensuring their reliability and regulatory compliance. You will lead customer interactions from defining evaluation questions to delivering informative reports that guide go/no-go decisions.... 
    Suggested

    Clera

    San Francisco, CA
    2 days ago
  •  ...Twelve Labs in San Francisco is seeking a vital ML Data Team member to lead video-language data preparation and model evaluation. You will define dataset needs, automate evaluation processes, and collaborate cross-functionally with engineering and AI model teams. Ideal... 
    Suggested
    Flexible hours

    Twelve-Labs

    San Francisco, CA
    4 days ago
  •  ...TwelveLabs is seeking a key member for its ML Data Team in San Francisco. This role involves designing evaluation frameworks, managing data operations, and collaborating cross-functionally. Ideal candidates should have over 5 years of experience in AI data operations,... 
    Suggested
    Flexible hours

    TwelveLabs

    San Francisco, CA
    4 days ago
  •  ...A cutting-edge AI technology firm in San Francisco is seeking an Evaluation Lead to drive the assessment of AI model performance. You will design evaluation methodologies, automate evaluation processes, and oversee various evaluation strategies. The ideal candidate has... 
    Suggested

    SupportFinity

    San Francisco, CA
    9 hours ago
  • $176k - $253k

     ...Quality. This role involves converting agent quality assessments from vague estimations to concrete metrics, ensuring agents are evaluated, tested, and monitored effectively. Candidates should have experience in building software evaluation frameworks and strong communication... 
    Suggested

    Harper Group

    San Francisco, CA
    1 day ago
  • $146.2k - $261.4k

     ...Research Lead - AI Cyber Testing & Evaluation RAND's Center on AI, Security, and Technology (CAST), part of the Global and Emerging Risks (GER) Division conducts cutting-edge research on transformative, high-impact technologies—including artificial intelligence and... 
    Work experience placement
    Remote work
    Work from home

    Employment Opportunities Inc

    San Francisco, CA
    2 days ago
  • $90k - $110k

    Common Sense Media is seeking an Evaluations Partner Manager in San Francisco, California. This role involves managing the operational execution of the Youth AI Safety Institute's evaluation work, with responsibilities such as coordinating workflow between internal teams... 
    Full time

    Common Sense Media

    San Francisco, CA
    4 days ago
  • $300k - $320k

     ...About the role: We are seeking a Technical Program Manager to lead our AI model evaluation initiatives across multiple workstreams. This role will be crucial in assessing the performance, capabilities, limitations, and potential risks of our AI models. Working closely... 
    Work at office
    Home office
    Visa sponsorship
    Relocation package

    Anthropic

    San Francisco, CA
    9 hours ago
  • Anthropic is seeking a Research Lead for the Training Insights team to shape the evaluation of model capabilities. This hands-on leadership role involves developing innovative evaluation methodologies and mentoring a team of researchers. You will play a crucial role in... 
    Remote work

    Anthropic

    San Francisco, CA
    2 days ago
  • Gravity Engineering Services Pvt Ltd. is looking for a Technical Program Manager for Research to define and build programs essential for research teams at the cutting edge of AI development. This role requires engagement across complex and ambiguous research initiatives...

    Gravity Engineering Services Pvt Ltd.

    San Francisco, CA
    4 days ago
  • $17 - $27.75 per hour

     ...KPIsSupports the store with recruiting, interviewing, performance evaluation, high-level training as neededProvides necessary feedback and...  ...experience in Managing Competitive Retail Space at the Lead Supervisor levelCan bend, reach, stretch as well as lift, carry... 
    Minimum wage
    Shift work

    Coach

    San Francisco, CA
    4 days ago
  •  ...Twelve-Labs in San Francisco is seeking a dedicated member for our ML Data Team to lead video data preparation and evaluation. This role includes defining dataset needs, automating processes, and enhancing data quality through collaboration. Ideal candidates should have... 
    Flexible hours

    Twelve-Labs

    San Francisco, CA
    4 days ago
  • $159k - $260k

     ...We are looking for a hands-on leader to build a new centralized Evaluation team. This team will  be responsible for providing...  ...the next generation of autonomous vehicles! You will... - Lead and build a cross functional team of software engineers, data analysts... 
    Full time
    Work at office
    Work from home
    Flexible hours

    Waabi

    San Francisco, CA
    28 days ago
  • $75k

     ...Lincoln Park, San Francisco. We are seeking a passionate, skilled Lead-chef to create delicious meals for our patrons. You will be...  ...accommodations due to disability or religious reasons will be evaluated in compliance with prevailing regulations. - Whilst English is... 
    Full time
    All shifts
    Shift work
    Afternoon shift
    Early shift

    McCalls Catering & Events

    San Francisco, CA
    9 days ago
  • $150k - $195k

     ...Offers Equity About the Role Plural is hiring an Originations Lead to run the sourcing of new renewable energy companies onto the...  ...simultaneously. Initial Qualification & Diligence – Conduct first-pass evaluation of inbound and sourced opportunities before the structuring... 
    Full time
    Work at office

    Plural

    San Francisco, CA
    9 hours ago
  • $185k - $220k

     ...work. About the Role: We are seeking a strategic and seasoned Lead, Internal Audit and SOX Compliance to join our Finance team reporting...  ...: Lead a comprehensive, strategic governance program that evaluates Internal Controls over Financial Reporting (ICFR) against the COSO... 
    Local area

    Apply

    San Francisco, CA
    1 day ago
  •  ...Offerings Competitive Retirement Benefits with 401(k) match Leading Financial Security Benefits Thoughtful Hybrid Workplace Set...  ...) Member of Technical Staff, Agent Workflow Systems and Evaluation Operational Excellence California Network Operations Center (NOC... 
    Remote work
    Night shift

    SB Energy

    San Francisco, CA
    4 days ago
  • $190k - $270k

     ...Founding Growth Lead We're looking for a Founding Growth Lead who is equally hands-on in driving the market expansion and accelerating...  ...initiatives across every channel where developers discover, evaluate, and adopt new tools. Build and maintain marketing infrastructure... 
    Full time
    Work at office
    Relocation
    Night shift
    Weekend work

    Inworld AI

    San Francisco, CA
    3 days ago
  •  ...Program Lead II, Clinical Program Development (Remote) The Clinical Operations Program Director is responsible to connect science...  ...complexity of trial designs vs speed) Responsible for the programmatic evaluation of risks and mitigations to achieving the asset strategy.... 
    Temporary work
    Work experience placement
    Remote work

    BioSpace, Inc.

    San Francisco, CA
    9 hours ago
  •  ...and increasingly becoming autonomous buyers: agents that shop, evaluate, and transact on behalf of humans. Soon, it will be more important...  ...to serve it. About The Role We're looking for a Founding GTM Lead to work closely with the founders to own outbound pipeline creation... 

    Unusual

    San Francisco, CA
    2 days ago
  • $127.5k - $248.5k

     ...time, all from batteries we already have. Codes and Standards Lead, Energy Storage Redwood Materials is pioneering a...  ...EV traction batteries in energy storage systems. This includes evaluating SAE guidance and FMVSS requirements, identifying transferable safety... 
    Full time

    Redwood Materials

    San Francisco, CA
    3 days ago
  • $152.5k - $193.41k

     ...Transportation Commission in San Francisco is hiring a Principal Modeler to lead the development of travel modeling tools. This role involves...  ...team and hands-on development of modeling functionality to evaluate strategies for improving quality of life. Ideal candidates will... 

    University of Illinois, Gies College of Business

    San Francisco, CA
    4 days ago
  •  ...that accelerate the rollout of its operational and infrastructure footprint. The role involves defining requirements for new sites, evaluating partners, and ensuring operational readiness for autonomous fleet activities. Applicants should possess an undergraduate degree... 

    Zoox

    San Francisco, CA
    9 hours ago
  • $22 - $28 per hour

     ...The Merch Lead is responsible for driving total store results as a member of the store leadership team with specific ownership for...  ...hiring, compensation, assignment, training, promotion, performance evaluation, discipline and discharge. Todd Snyder also provides reasonable... 
    Full time
    Part time
    Local area
    Immediate start
    Flexible hours
    Shift work
    Afternoon shift

    American Eagle Outfitters

    San Francisco, CA
    4 days ago
  • $170k - $220k

     ...transactions and millions of patient journeys. Distyl is backed by leading investors including Lightspeed Venture Partners, Khosla...  ...stronger and more impactful. We are an equal opportunity employer and evaluate all applicants without regard to race, color, religion, sex,... 
    Work at office
    Remote work

    Distyl AI

    San Francisco, CA
    4 days ago
  • $119k - $299.93k

     ...Governance team in San Francisco. This full‑time role involves enhancing project delivery with innovative methodologies, and leading teams to evaluate governance and risk frameworks. Candidates should have at least 8 years of experience, a Bachelor’s degree, and skill in... 
    Full time

    PwC

    San Francisco, CA
    4 days ago
  • $185k - $260k

     ...week in office). About The Role We are hiring a Strategic Sourcing Lead (Ads & Consumer Verticals) to manage the commercial execution...  ...compliance, and risk awareness. Have strong analytical skills to evaluate pricing models, volume ramps, and vendor risk in the context of... 
    Work at office
    3 days per week

    OpenAI

    San Francisco, CA
    4 days ago
  •  ...everything we build. About the role We’re looking for a Growth Lead, Partnerships & Community to own the relationship side of how MoldCo...  ...health, environmental illness, longevity, chronic illness. Evaluate audience fit and host credibility before recommending Prepare guest... 
    Fixed term contract

    The Immune Co.

    San Francisco, CA
    9 hours ago
  •  ...organize human intelligence to power the AI economy. We partner with leading AI labs and enterprises to provide the human intelligence...  ...consistently meet demand Scale & Optimize Sourcing Channels: Evaluate, experiment with, and scale sourcing channels to maximize yield... 
    Work at office
    Relocation package

    Mercor Alabaster

    San Francisco, CA
    2 days ago
  •  ...startup co-founded by Jared Kushner and Elad Gil, and backed by leading Silicon Valley builders including Patrick Collison and Andrej...  ...complex diligence processes, strategic analysis, or technical evaluation processes. Skills & Attributes Strong technical literacy in AI... 
    Work at office
    Relocation
    3 days per week

    Brainco

    San Francisco, CA
    9 hours ago

Do you want to receive more vacancies?

Subscribe and receive similar vacancies to Evaluation Lead. Be the first to apply!