Sign up to access all features of our service.
  • Job search
  • Favorites
  • Create a CV
    New
  • Salaries
  • Subscriptions

AI Evaluation Lead: Real-World Systems Benchmarking

SupportFinity

A cutting-edge AI technology firm in San Francisco is seeking an Evaluation Lead to drive the assessment of AI model performance. You will design evaluation methodologies, automate evaluation processes, and oversee various evaluation strategies. The ideal candidate has extensive experience in AI model evaluation and is proficient in Python. This high-impact role demands strong collaboration skills and a startup-ready mindset to thrive in a fast-paced environment. #J-18808-Ljbffr

Vacancy posted 7 hours ago
Similar jobs that could be interesting for youBased on the AI Evaluation Lead: Real-World Systems Benchmarking in San Francisco, CA vacancy
  • RentFlow, based in San Francisco, is seeking an AI/ML lead to own underwriting and cash-flow insights. This foundational...  ...and business outcomes. Ideal candidates have experience with ML systems, LLMs, and a passion for real-world impact. #J-18808-Ljbffr RentFlow
    Suggested

    RentFlow

    San Francisco, CA
    4 days ago
  • RentFlow (YC S24) is looking for an AI/ML Lead to own underwriting, cash‑flow intelligence, and data insights end‑to‑end. You will model messy, real‑world SMB cash flows and build decisioning systems while leveraging LLMs for unique insights. The ideal candidate will enjoy... 
    Suggested

    RentFlow (YC S24)

    San Francisco, CA
    4 days ago
  •  ...Cartesia is seeking an Evaluations Lead in San Francisco to design evaluation frameworks for AI models. This role involves defining key model capabilities, developing...  ...will have a background in creating evaluation systems for generative models and strong technical skills... 
    Suggested

    Cartesia, Inc.

    San Francisco, CA
    4 days ago
  • $146.2k - $261.4k

     ...Research Lead - AI Cyber Testing & Evaluation RAND's Center on AI, Security, and Technology...  .... Your team will build systems to evaluate how AI models...  ...may include developing benchmarks for fully autonomous operations...  ...AI policy across the world. Your team will communicate... 
    Suggested
    Work experience placement
    Remote work
    Work from home

    Employment Opportunities Inc

    San Francisco, CA
    2 days ago
  • $140k - $160k

     ...authorization support. The World Economic Forum,...  .... The Centre for AI Excellence (CAIE) is...  ...responsible adoption and real-world impact through...  ...CAIE is looking for a Lead for its Frontier AI Systems & Capabilities...  ...centric frameworks for evaluation, assurance, and deployment... 
    Suggested
    Relocation package
    Shift work
    3 days per week

    World Economic Forum

    San Francisco, CA
    4 days ago
  •  ...knowledge. They can lead unlimited,...  ...looking for an AI Research Lead to...  ...thousands of real buyer interactions...  ...from real‑world agent interactions...  ...research team. Evaluate new model...  ...production‑grade systems. Strong product...  ...tuning on synthetic benchmarks — you’ll be... 
    Full time
    Live in
    Relocation
    Visa sponsorship

    1mind

    San Francisco, CA
    2 days ago
  • The World Economic Forum is looking for a Lead for its Frontier AI Systems & Capabilities workstream in San Francisco. This role will guide advancements in AI through...  ...This position offers a unique opportunity to shape real-world applications and community engagement... 

    World Economic Forum

    San Francisco, CA
    4 days ago
  • $117.2k - $313.7k

     ...Salesforce is the #1 AI CRM, where humans...  ...a way of life. The world of work as we know...  ...career at the company leading workforce...  ...Salesforce. Distributed Systems Software Engineer -...  ...proficiency with solving real-world data...  ...reliably. Critically evaluate code (Human or AI-... 

    Salesforce

    San Francisco, CA
    7 hours ago
  •  ...Systems Integrators Partnerships Lead Our mission is to bring autonomy to software engineering...  ...agents that accelerate the world's largest enterprise...  ...being written. You'll have real influence over how we engage...  ...to deliver transformative AI-powered development solutions... 

    Factory

    San Francisco, CA
    2 days ago
  •  ...Clera is seeking a skilled individual to evaluate medical imaging AI systems, ensuring their reliability and regulatory compliance. You will lead customer interactions from defining evaluation questions to delivering informative reports that guide go/no-go decisions.... 

    Clera

    San Francisco, CA
    2 days ago
  •  ...About Archetype AI Archetype AI is developing the world's first AI platform to bring AI into the real world. Formed by an exceptionally high...  ...AI is seeking a hands‑on Evaluation Lead to build and assess model...  ...tracking competitive industry benchmarks. This is a high‑impact... 

    SupportFinity

    San Francisco, CA
    7 hours ago
  • $170k - $220k

     ...About Distyl AI Distyl is an applied AI...  ...partnering with the world’s most ambitious institutions...  ...self-constructing systems, the development of...  ...is backed by leading investors including...  ...AI tools, and real‑world business problems...  ...employer and evaluate all applicants without... 
    Work at office
    Remote work

    Distyl AI, Inc.

    San Francisco, CA
    4 days ago
  • $300 per month

     ...vertically integrated AI infrastructure...  ...tokens — to power the world's most ambitious AI...  ...to create industry-leading technical content....  ...work while giving you real ownership over...  ...care about, how they evaluate tools, and what content...  ...at Crusoe Energy Systems in San Francisco,... 
    Temporary work
    For contractors
    Day shift

    Crusoe Energy Systems

    San Francisco, CA
    7 hours ago
  •  ...Co. is an applied AI startup co-founded...  ...Gil, and backed by leading Silicon Valley builders...  ...for the world’s most important institutions...  ...impact on real-world problems across...  ...governments, healthcare systems, and critical industries...  ..., or technical evaluation processes. Skills &... 
    Work at office
    Relocation
    3 days per week

    Brainco

    San Francisco, CA
    7 hours ago
  •  ...consulting Industry. AI Transformation will...  .... You’ll build the systems, the AI workflows,...  ...Stripe to deliver real AI outcomes. Traditional...  ...companies in the world and skeptical of...  ...interview processes, evaluation frameworks,...  ...Level: Recruiting Lead (IC). Scope and team... 
    Interim role
    Work at office
    Local area
    Relocation package

    Klarity Intelligence, Inc.

    San Francisco, CA
    5 days ago
  • $96.8k - $135k

     ...Job Overview: Real Estate Manager – DSD Infrastructure...  ...in identifying and evaluating real estate...  ...(NASDAQ: KDP) is a leading beverage company in...  ...business model and world-class brand...  ...serve coffee brewing system in North America at...  ...with our open roles.AI does not make hiring... 
    Work at office
    Relocation

    Keurig Dr Pepper Inc.

    San Francisco, CA
    1 day ago
  • $10 per hour

     ...pivotal role in how goods move around the world. We are proud to have the support of...  ...of every freight decision. You will lead the Autonomous Freight Systems team, owning the systems that power...  ...to a tech-run one. You will lead an AI-first engineering team tasked with automating... 

    Voiceflow

    San Francisco, CA
    4 days ago
  •  ...Co. is an applied AI startup co-founded...  ...Gil, and backed by leading Silicon Valley builders...  ...for the world's most important institutions...  ...impact on real-world problems across...  ...governments, healthcare systems, and critical industries...  ..., technical evaluations, or enterprise AI deployments... 

    Brainco

    San Francisco, CA
    2 days ago
  • $167.3k - $261.4k

     ...Term) RAND’s Center on AI, Security, and Technology...  ...as Senior Research Lead - AI Security Portfolio...  ...on securing advanced AI systems, understanding their cyber...  ..., cyber capability evaluation, and infrastructure development...  ...‑edge research with real‑world policy impact.... 
    Fixed term contract
    Remote work
    Work from home

    RAND Corporation

    San Francisco, CA
    3 days ago
  • $167.3k - $261.4k

    Senior Research Lead - AI Security Portfolio page is loaded## Senior...  ...on securing advanced AI systems, understanding their cyber capabilities...  ...research, cyber capability evaluation, and infrastructure...  ...cutting-edge research with real-world policy impact.**Qualifications... 
    Fixed term contract
    Work experience placement
    Remote work
    Work from home

    RAND Corporation

    San Francisco, CA
    2 days ago
  • RTI International is seeking a Health Outcomes Researcher to support real-world evidence and observational research studies. The successful candidate will manage study operations, timelines, and budgets, ensuring alignment with scientific and regulatory standards. The... 

    RTI International

    San Francisco, CA
    2 days ago
  •  ...watched person in the world. Renowned for revolutionizing...  ...are hiring our first AI Enablement Lead to drive how AI is...  ..., data architecting, evaluation, and basic deployment....  .... Strong product and systems thinking. You are good...  ...influences culture in real time. This is your... 
    Relocation package
    Flexible hours

    MrBeast

    San Francisco, CA
    6 days ago
  • $300k - $320k

     ...About the role: We are seeking a Technical Program Manager to lead our AI model evaluation initiatives across multiple workstreams. This role will be...  ...the opportunity to shape the development of advanced AI systems and contribute to Anthropic's mission of ensuring that AI... 
    Work at office
    Home office
    Visa sponsorship
    Relocation package

    Anthropic

    San Francisco, CA
    7 hours ago
  • $160k - $250k

     ...CLERA is a well-funded AI startup in mechanical engineering seeking a Staff Engineer – Agentic AI to lead the development of their core agent intelligence. This high-impact role requires 7+ years in software engineering and deep experience with LLM-based agents. The on... 

    Clera

    San Francisco, CA
    4 days ago
  •  ...seeking a Member of Technical Staff to lead its new robotics vertical. This role entails defining benchmarking methodologies and producing leaderboards to evaluate robotics capabilities. Ideal...  ...will possess a strong interest in AI and robotics, alongside robust coding... 

    Artificial Analysis, Inc.

    San Francisco, CA
    1 day ago
  • $186.1k - $300.55k

     ...disconnected from business systems of record, costing...  ...you'll do As a Lead Product Manager for Agentic AI Platform, you will...  ...and AI Agent evaluation within the Intelligent...  ...to measure, benchmark, and improve agentic...  ...trust and making the world more agreeable for... 
    Contract work
    Work at office
    Local area
    Remote work
    Shift work
    2 days per week

    DocuSign

    San Francisco, CA
    6 days ago
  • $180k - $225k

     ...Lead Product Manager, AI Responsibilities Own the product roadmap...  ...capabilities, including model evaluation, performance...  ...security teams to ensure AI systems meet healthcare...  ...and driven by solving real clinical problems and...  ...application to real-world workflows. Previous product... 
    Flexible hours

    Transformcap

    San Francisco, CA
    4 days ago
  •  ...build and orchestrate AI workforces. Our AI...  ..., and enterprise systems. Born in Y Combinator...  ...- where AI has real consequences. We started...  ...and functions. Leading training and onboarding...  ...the Best - Join a world-class team of...  ...for the purpose of evaluating and selecting you as... 
    Worldwide
    Shift work

    HappyRobot

    San Francisco, CA
    3 days ago
  • $230k - $260k

     ...As Policy Outreach Lead, you will build relationships with external...  ...outreach to drive progress in AI policy and AI safety. You will...  ..., and steerable AI systems. We want AI to be safe and beneficial...  ...creating compelling narratives and real-world examples to support your... 
    Work at office
    Local area
    Home office
    Relocation package

    Anthropic

    San Francisco, CA
    7 hours ago
  •  ...media footprints in the world. Our remote‑first team,...  ...has grown by solving — at real scale — the exact...  ...brand new business and AI platform for enterprise...  ...sits across a company's systems, runs one shared, org‑wide...  ...seeking a Client Deployment Lead, AI to embed on‑site alongside... 
    Interim role
    Remote work
    Work from home
    Worldwide
    Flexible hours
    Shift work

    TheSoul Group

    San Francisco, CA
    1 day ago

Do you want to receive more vacancies?

Subscribe and receive similar vacancies to AI Evaluation Lead: Real-World Systems Benchmarking. Be the first to apply!