Sign up to access all features of our service.
  • Job search
  • Favorites
  • Create a CV
    New
  • Salaries
  • Subscriptions

AI Evaluation Lead: Real-World Systems Benchmarking

SupportFinity™

A cutting-edge AI technology firm in San Francisco is seeking an Evaluation Lead to drive the assessment of AI model performance. You will design evaluation methodologies, automate evaluation processes, and oversee various evaluation strategies. The ideal candidate has extensive experience in AI model evaluation and is proficient in Python. This high-impact role demands strong collaboration skills and a startup-ready mindset to thrive in a fast-paced environment. #J-18808-Ljbffr SupportFinity™

Vacancy posted 2 days ago
Similar jobs that could be interesting for youBased on the AI Evaluation Lead: Real-World Systems Benchmarking in San Francisco, CA vacancy
  •  ...Ando is building AI-native workforce infrastructure...  ...is rebuilding this system from first...  .... We are live with real customers and...  ...Principal AI / ML Systems Lead to serve as Ando's...  ...operate under real-world uncertainty Set...  ...for model evaluation, accuracy, and learning... 
    Suggested
    Hourly pay
    Contract work
    Shift work

    Ando Technologies, Inc

    San Francisco, CA
    4 days ago
  • $146.2k - $261.4k

     ...Research Lead - AI Cyber Testing & Evaluation RAND's Center on AI, Security, and Technology...  .... Your team will build systems to evaluate how AI models...  ...may include developing benchmarks for fully autonomous operations...  ...AI policy across the world. Your team will communicate... 
    Suggested
    Work experience placement
    Remote work
    Work from home

    Employment Opportunities Inc

    San Francisco, CA
    1 day ago
  • $130k - $155k

    Lead, Frontier AI Systems, Centre for AI Excellence page is loaded## Lead, Frontier...  ...authorization support.The World Economic Forum, committed...  ...their responsible adoption and real-world impact through a...  ...system-centric frameworks for evaluation, assurance, and deployment,... 
    Suggested
    Relocation package
    Shift work
    3 days per week

    WEF

    San Francisco, CA
    3 days ago
  •  ...knowledge. They can lead unlimited,...  ...looking for an AI Research Lead to...  ...thousands of real buyer interactions...  ...from real‑world agent interactions...  ...research team. Evaluate new model architectures...  ...‑grade systems. Strong product...  ...tuning on synthetic benchmarks — you’ll be... 
    Suggested
    Full time
    Live in
    Relocation
    Visa sponsorship

    1mind

    San Francisco, CA
    4 days ago
  • $117.2k - $313.7k

     ...Salesforce is the #1 AI CRM, where humans...  ...a way of life. The world of work as we know...  ...career at the company leading workforce...  ...Salesforce. Distributed Systems Software Engineer -...  ...proficiency with solving real-world data...  ...reliably. Critically evaluate code (Human or AI-... 
    Suggested

    Salesforce.Com Inc

    San Francisco, CA
    3 days ago
  •  ...Factory Systems Integrators Partnerships Lead Our mission is to bring autonomy to software...  ...agents that accelerate the world's largest enterprise...  ...being written. You'll have real influence over how we engage...  ...to deliver transformative AI-powered development solutions... 

    Factory

    San Francisco, CA
    2 days ago
  • $185k - $225k

     ...knowledge. They can lead unlimited,...  ...who is analytical, systems‑minded, and thrives...  ...re passionate about AI, data visualization...  ...Opportunity to build a world‑class function from...  ...services to support real‑time, multimodal interactions...  ...tests and evaluation frameworks to ensure... 
    Full time
    Contract work
    Remote work

    1mind AI Inc.

    San Francisco, CA
    4 days ago
  • $300 per month

     ...vertically integrated AI infrastructure...  ...tokens — to power the world's most ambitious AI...  ...to create industry-leading technical content....  ...work while giving you real ownership over...  ...care about, how they evaluate tools, and what content...  ...at Crusoe Energy Systems in San Francisco,... 
    Temporary work
    For contractors
    Day shift

    Crusoe Energy Systems

    San Francisco, CA
    3 days ago
  •  ...Co. is an applied AI startup co-founded...  ...Gil, and backed by leading Silicon Valley builders...  ...for the world’s most important institutions...  ...impact on real-world problems across...  ...governments, healthcare systems, and critical industries...  ..., or technical evaluation processes. Skills... 
    Work at office
    Relocation
    3 days per week

    BrainCo

    San Francisco, CA
    3 days ago
  • $96.8k - $135k

    Job Overview: Real Estate Manager - DSD Infrastructure...  ...in identifying and evaluating real estate...  ...(NASDAQ: KDP) is a leading beverage company in...  ...business model and world-class brand...  ...serve coffee brewing system in North America at...  ...with our open roles.AI does not make hiring... 
    Work at office
    Relocation

    Keurig Dr Pepper

    San Francisco, CA
    3 days ago
  • Turing is seeking a licensed physician to work on evaluating AI systems in clinical settings. This role involves designing evaluation methods for AI performance on medical problems and engaging in research collaborations to enhance AI capabilities. Ideal candidates will... 
    Remote job
    Flexible hours

    Turing

    San Francisco, CA
    4 days ago
  • $167.3k - $261.4k

     ...Term) RAND’s Center on AI, Security, and Technology...  ...as Senior Research Lead - AI Security Portfolio...  ...on securing advanced AI systems, understanding their cyber...  ..., cyber capability evaluation, and infrastructure development...  ...‑edge research with real‑world policy impact.... 
    Fixed term contract
    Remote work
    Work from home

    RAND Corporation

    San Francisco, CA
    3 days ago
  • About Fractional AI How do you turn a decades...  ...into an industry-leading medical coding...  ...getting complex AI systems built right, with strong...  .... M&A. Source and evaluate acquisition targets...  ...translate into real capabilities. What...  ...reputation as the world\'s best applied AI... 

    Fractional AI

    San Francisco, CA
    3 days ago
  •  ...Kana is an agentic AI platform for marketers...  ...in an AI-driven world — using synthetic data...  .... This role leads that team of AI Solutions...  ...identify and the systems you design will directly...  ...standards, quality benchmarks for AI‑generated...  ...refined through real engagements, and early... 

    Dormont Manufacturing Co

    San Francisco, CA
    3 days ago
  • RTI International is seeking a Health Outcomes Researcher to support real-world evidence and observational research studies. The successful candidate will manage study operations, timelines, and budgets, ensuring alignment with scientific and regulatory standards. The... 

    RTI International

    San Francisco, CA
    2 days ago
  •  ...watched person in the world. Renowned for revolutionizing...  ...are hiring our first AI Enablement Lead to drive how AI is...  ..., data architecting, evaluation, and basic deployment....  .... Strong product and systems thinking. You are good...  ...influences culture in real time. This is your... 
    Relocation package
    Flexible hours

    MrBeast

    San Francisco, CA
    1 day ago
  • $10 per hour

     ...pivotal role in how goods move around the world. We are proud to have the support of...  ...of every freight decision. You will lead the Autonomous Freight Systems team, owning the systems that power...  ...to a tech-run one. You will lead an AI-first engineering team tasked with automating... 

    Voiceflow

    San Francisco, CA
    4 days ago
  • $225k - $320k

    Backed by leading Silicon Valley investors, Peregrine...  ...and accuracy. Our AI‑enabled platform...  ...is applied, evaluated, and operationalized...  ...platform Translate real operational problems...  ...time and batch data systems Ensure models are...  ...that reflect real‑world decision impact, not... 
    Local area

    peregrine technologies

    San Francisco, CA
    3 days ago
  • About Archetype AI Archetype AI is developing the world's first AI platform to bring AI into the real world. Formed by an exceptionally high...  ...AI is seeking a hands‑on Evaluation Lead to build and assess model...  ...tracking competitive industry benchmarks. This is a high‑impact... 

    SupportFinity™

    San Francisco, CA
    2 days ago
  •  ...Intelligence is the open platform for evaluating how AI models perform in the real world. Created by researchers from UC...  ...month to explore how frontier systems perform — and we use our community...  ...-centered model evaluations. Leading enterprises and AI labs rely on our... 
    Permanent employment
    Work at office

    Arena

    San Francisco, CA
    2 days ago
  • $186.1k - $300.55k

     ...disconnected from business systems of record, costing...  ...you'll do As a Lead Product Manager for Agentic AI Platform, you will...  ...and AI Agent evaluation within the Intelligent...  ...to measure, benchmark, and improve agentic...  ...trust and making the world more agreeable for... 
    Contract work
    Work at office
    Local area
    Remote work
    Shift work
    2 days per week

    DocuSign

    San Francisco, CA
    6 days ago
  • $150k - $170k

     ...Individually, our AI and Superhumans are...  ...center — combining the world’s smartest, auto-...  ...for an Innovation Lead, Office of the CEO...  ...AI capability into real, measurable...  ...across the tools and systems the business uses—and...  ...are applying AI Evaluate third-party tools,... 
    Work at office

    Crescendo Inc

    San Francisco, CA
    4 days ago
  • $357k

     ...applications, processes, and AI into a single,...  ...process to power real-time orchestration...  ...companies in the world Deloitte Tech Fast...  ...an exceptional Lead AI Research Scientist...  ...of enterprise AI systems. This is a research...  ...graphs, and agent evaluation frameworks, while... 
    Work at office
    Remote work
    Flexible hours

    Workato

    San Francisco, CA
    29 days ago
  • About Rad AI At Rad AI, we’re on a mission...  ...datasets in the world, our AI has helped...  ...groups and healthcare systems and nearly 50% of all...  ...that make a real impact. Most recently...  ...Why Join Us: As a Lead Product Manager for...  ...Text, including model evaluation, performance monitoring... 
    Full time
    Remote work
    Flexible hours

    Rad AI

    San Francisco, CA
    2 days ago
  • $180k - $225k

    Lead Product Manager, AI Responsibilities Own the product roadmap...  ...capabilities, including model evaluation, performance...  ...security teams to ensure AI systems meet healthcare...  ...and driven by solving real clinical problems and...  ...application to real-world workflows. Previous... 
    Flexible hours

    Transformcap

    San Francisco, CA
    2 days ago
  • $170k - $190k

     ...semiconductor industry, critical AI infrastructure, and the broader systems that power our world. We work as one team...  ...and running a real-time community platform...  ...product feedback, hiring leads, and an honest read on...  ...and scaling over time Evaluate and recommend sponsorships... 
    Full time

    Drive Capital

    San Francisco, CA
    3 days ago
  •  ...Manager Responsibilities: Lead the data quality evaluation by investigating all...  ...potential impact on study systems setup, study conduction, or...  ...approach. Knowledge of Real-World data sources and processes...  ...Artificial Intelligence (AI). Project Management skills... 
    For contractors

    Katalyst HealthCares & Life Sciences

    San Francisco, CA
    2 days ago
  • $240k - $300k

     ...Cobalt AI is revolutionizing physical safety through...  ...platform that provides real-time, human-verified...  ...and edge-deployed systems. We are looking for...  ...team output. You will lead our current high-caliber...  ...Velocity: Spearhead the evaluation and rollout of AI-driven... 
    Work at office
    Local area
    Remote work

    Cobalt AI

    San Francisco, CA
    4 days ago
  • $180.8k - $226k

     ...frontier of GenAI and human-AI collaboration. The Gen...  .... You will act as the lead investigative analyst...  ..., and define offline evaluation frameworks (e.g.,...  ...to develop reliable AI systems for the world's most important decisions...  ...that deliver real impact. We work closely... 
    Full time
    Shift work

    Scale AI

    San Francisco, CA
    7 days ago
  • $150k - $225k

     ...Job Description Job Title: Functional Systems Lead Location: Burlingame, CA Department...  ...in scaling gigawatt-level innovation at world-class companies such as Tesla, Northvolt,...  ...embedded teams to guide development of real-time, safety-critical firmware (controls... 
    Full time
    Flexible hours

    Peak Energy

    San Francisco, CA
    14 days ago

Do you want to receive more vacancies?

Subscribe and receive similar vacancies to AI Evaluation Lead: Real-World Systems Benchmarking. Be the first to apply!