AI Evaluation Lead: Real-World Systems Benchmarking

SupportFinity

A cutting-edge AI technology firm in San Francisco is seeking an Evaluation Lead to drive the assessment of AI model performance. You will design evaluation methodologies, automate evaluation processes, and oversee various evaluation strategies. The ideal candidate has extensive experience in AI model evaluation and is proficient in Python. This high-impact role demands strong collaboration skills and a startup-ready mindset to thrive in a fast-paced environment. #J-18808-Ljbffr

Apply

Vacancy posted 7 hours ago

Similar jobs that could be interesting for youBased on the AI Evaluation Lead: Real-World Systems Benchmarking in San Francisco, CA vacancy

Senior AI/ML Lead — Real-World SMB Cash-Flow AI
RentFlow, based in San Francisco, is seeking an AI/ML lead to own underwriting and cash-flow insights. This foundational... ...and business outcomes. Ideal candidates have experience with ML systems, LLMs, and a passion for real-world impact. #J-18808-Ljbffr RentFlow
Suggested
RentFlow
San Francisco, CA
4 days ago
Senior AI/ML Lead — Real‑World Underwriting & AI Stack
RentFlow (YC S24) is looking for an AI/ML Lead to own underwriting, cash‑flow intelligence, and data insights end‑to‑end. You will model messy, real‑world SMB cash flows and build decisioning systems while leveraging LLMs for unique insights. The ideal candidate will enjoy...
Suggested
RentFlow (YC S24)
San Francisco, CA
4 days ago
Evaluations Lead AI Metrics & Interactive Systems
...Cartesia is seeking an Evaluations Lead in San Francisco to design evaluation frameworks for AI models. This role involves defining key model capabilities, developing... ...will have a background in creating evaluation systems for generative models and strong technical skills...
Suggested
Cartesia, Inc.
San Francisco, CA
4 days ago
Research Lead - AI Cyber Testing & Evaluation
$146.2k - $261.4k
...Research Lead - AI Cyber Testing & Evaluation RAND's Center on AI, Security, and Technology... .... Your team will build systems to evaluate how AI models... ...may include developing benchmarks for fully autonomous operations... ...AI policy across the world. Your team will communicate...
Suggested
Work experience placement
Remote work
Work from home
Employment Opportunities Inc
San Francisco, CA
2 days ago
Lead, Frontier AI Systems, Centre for AI Excellence
$140k - $160k
...authorization support. The World Economic Forum,... .... The Centre for AI Excellence (CAIE) is... ...responsible adoption and real-world impact through... ...CAIE is looking for a Lead for its Frontier AI Systems & Capabilities... ...centric frameworks for evaluation, assurance, and deployment...
Suggested
Relocation package
Shift work
3 days per week
World Economic Forum
San Francisco, CA
4 days ago
AI Research Lead
...knowledge. They can lead unlimited,... ...looking for an AI Research Lead to... ...thousands of real buyer interactions... ...from real‑world agent interactions... ...research team. Evaluate new model... ...production‑grade systems. Strong product... ...tuning on synthetic benchmarks — you’ll be...
Full time
Live in
Relocation
Visa sponsorship
1mind
San Francisco, CA
2 days ago
Frontier AI Systems Lead — Strategy, Safety & Impact (SF)
The World Economic Forum is looking for a Lead for its Frontier AI Systems & Capabilities workstream in San Francisco. This role will guide advancements in AI through... ...This position offers a unique opportunity to shape real-world applications and community engagement...
World Economic Forum
San Francisco, CA
4 days ago
Distributed Systems Software Engineer - Public Cloud (Mid/Senior/Lead/Principal)
$117.2k - $313.7k
...Salesforce is the #1 AI CRM, where humans... ...a way of life. The world of work as we know... ...career at the company leading workforce... ...Salesforce. Distributed Systems Software Engineer -... ...proficiency with solving real-world data... ...reliably. Critically evaluate code (Human or AI-...
Salesforce
San Francisco, CA
7 hours ago
Systems Integrator Partnerships Lead
...Systems Integrators Partnerships Lead Our mission is to bring autonomy to software engineering... ...agents that accelerate the world's largest enterprise... ...being written. You'll have real influence over how we engage... ...to deliver transformative AI-powered development solutions...
Factory
San Francisco, CA
2 days ago
Medical Imaging AI Evaluation Lead
...Clera is seeking a skilled individual to evaluate medical imaging AI systems, ensuring their reliability and regulatory compliance. You will lead customer interactions from defining evaluation questions to delivering informative reports that guide go/no-go decisions....
Clera
San Francisco, CA
2 days ago
Evaluation Lead
...About Archetype AI Archetype AI is developing the world's first AI platform to bring AI into the real world. Formed by an exceptionally high... ...AI is seeking a hands‑on Evaluation Lead to build and assess model... ...tracking competitive industry benchmarks. This is a high‑impact...
SupportFinity
San Francisco, CA
7 hours ago
IT Lead
$170k - $220k
...About Distyl AI Distyl is an applied AI... ...partnering with the world’s most ambitious institutions... ...self-constructing systems, the development of... ...is backed by leading investors including... ...AI tools, and real‑world business problems... ...employer and evaluate all applicants without...
Work at office
Remote work
Distyl AI, Inc.
San Francisco, CA
4 days ago
Lead Technical Content Marketing for AI Infrastructure
$300 per month
...vertically integrated AI infrastructure... ...tokens — to power the world's most ambitious AI... ...to create industry-leading technical content.... ...work while giving you real ownership over... ...care about, how they evaluate tools, and what content... ...at Crusoe Energy Systems in San Francisco,...
Temporary work
For contractors
Day shift
Crusoe Energy Systems
San Francisco, CA
7 hours ago
AI-Powered PE Engagement Lead
...Co. is an applied AI startup co-founded... ...Gil, and backed by leading Silicon Valley builders... ...for the world’s most important institutions... ...impact on real-world problems across... ...governments, healthcare systems, and critical industries... ..., or technical evaluation processes. Skills &...
Work at office
Relocation
3 days per week
Brainco
San Francisco, CA
7 hours ago
Founding Recruiting Lead - AI-Native Hiring, SF
...consulting Industry. AI Transformation will... .... You’ll build the systems, the AI workflows,... ...Stripe to deliver real AI outcomes. Traditional... ...companies in the world and skeptical of... ...interview processes, evaluation frameworks,... ...Level: Recruiting Lead (IC). Scope and team...
Interim role
Work at office
Local area
Relocation package
Klarity Intelligence, Inc.
San Francisco, CA
5 days ago
Real Estate Strategy Lead - Warehouse & DSD
$96.8k - $135k
...Job Overview: Real Estate Manager – DSD Infrastructure... ...in identifying and evaluating real estate... ...(NASDAQ: KDP) is a leading beverage company in... ...business model and world-class brand... ...serve coffee brewing system in North America at... ...with our open roles.AI does not make hiring...
Work at office
Relocation
Keurig Dr Pepper Inc.
San Francisco, CA
1 day ago
AI-Driven Autonomous Freight Systems Lead
$10 per hour
...pivotal role in how goods move around the world. We are proud to have the support of... ...of every freight decision. You will lead the Autonomous Freight Systems team, owning the systems that power... ...to a tech-run one. You will lead an AI-first engineering team tasked with automating...
Voiceflow
San Francisco, CA
4 days ago
AI Deployment Lead
...Co. is an applied AI startup co-founded... ...Gil, and backed by leading Silicon Valley builders... ...for the world's most important institutions... ...impact on real-world problems across... ...governments, healthcare systems, and critical industries... ..., technical evaluations, or enterprise AI deployments...
Brainco
San Francisco, CA
2 days ago
Senior Research Lead - AI Security Portfolio
$167.3k - $261.4k
...Term) RAND’s Center on AI, Security, and Technology... ...as Senior Research Lead - AI Security Portfolio... ...on securing advanced AI systems, understanding their cyber... ..., cyber capability evaluation, and infrastructure development... ...‑edge research with real‑world policy impact....
Fixed term contract
Remote work
Work from home
RAND Corporation
San Francisco, CA
3 days ago
Senior Research Lead, AI Security Portfolio
$167.3k - $261.4k
Senior Research Lead - AI Security Portfolio page is loaded## Senior... ...on securing advanced AI systems, understanding their cyber capabilities... ...research, cyber capability evaluation, and infrastructure... ...cutting-edge research with real-world policy impact.**Qualifications...
Fixed term contract
Work experience placement
Remote work
Work from home
RAND Corporation
San Francisco, CA
2 days ago
Real-World Evidence & Observational Studies Lead
RTI International is seeking a Health Outcomes Researcher to support real-world evidence and observational research studies. The successful candidate will manage study operations, timelines, and budgets, ensuring alignment with scientific and regulatory standards. The...
RTI International
San Francisco, CA
2 days ago
AI Enablement Lead
...watched person in the world. Renowned for revolutionizing... ...are hiring our first AI Enablement Lead to drive how AI is... ..., data architecting, evaluation, and basic deployment.... .... Strong product and systems thinking. You are good... ...influences culture in real time. This is your...
Relocation package
Flexible hours
MrBeast
San Francisco, CA
6 days ago
AI Model Evaluation Program Lead
$300k - $320k
...About the role: We are seeking a Technical Program Manager to lead our AI model evaluation initiatives across multiple workstreams. This role will be... ...the opportunity to shape the development of advanced AI systems and contribute to Anthropic's mission of ensuring that AI...
Work at office
Home office
Visa sponsorship
Relocation package
Anthropic
San Francisco, CA
7 hours ago
Staff Engineer, Agentic AI Lead Real-World AI Workflows
$160k - $250k
...CLERA is a well-funded AI startup in mechanical engineering seeking a Staff Engineer – Agentic AI to lead the development of their core agent intelligence. This high-impact role requires 7+ years in software engineering and deep experience with LLM-based agents. The on...
Clera
San Francisco, CA
4 days ago
Robotics Benchmarking Lead Member of Technical Staff
...seeking a Member of Technical Staff to lead its new robotics vertical. This role entails defining benchmarking methodologies and producing leaderboards to evaluate robotics capabilities. Ideal... ...will possess a strong interest in AI and robotics, alongside robust coding...
Artificial Analysis, Inc.
San Francisco, CA
1 day ago
Lead Product Manager, Agentic AI Platform
$186.1k - $300.55k
...disconnected from business systems of record, costing... ...you'll do As a Lead Product Manager for Agentic AI Platform, you will... ...and AI Agent evaluation within the Intelligent... ...to measure, benchmark, and improve agentic... ...trust and making the world more agreeable for...
Contract work
Work at office
Local area
Remote work
Shift work
2 days per week
DocuSign
San Francisco, CA
6 days ago
Lead Product Manager, AI
$180k - $225k
...Lead Product Manager, AI Responsibilities Own the product roadmap... ...capabilities, including model evaluation, performance... ...security teams to ensure AI systems meet healthcare... ...and driven by solving real clinical problems and... ...application to real-world workflows. Previous product...
Flexible hours
Transformcap
San Francisco, CA
4 days ago
AI Deployment Strategist: Lead Customer Implementations
...build and orchestrate AI workforces. Our AI... ..., and enterprise systems. Born in Y Combinator... ...- where AI has real consequences. We started... ...and functions. Leading training and onboarding... ...the Best - Join a world-class team of... ...for the purpose of evaluating and selecting you as...
Worldwide
Shift work
HappyRobot
San Francisco, CA
3 days ago
AI Policy & Stakeholder Outreach Lead
$230k - $260k
...As Policy Outreach Lead, you will build relationships with external... ...outreach to drive progress in AI policy and AI safety. You will... ..., and steerable AI systems. We want AI to be safe and beneficial... ...creating compelling narratives and real-world examples to support your...
Work at office
Local area
Home office
Relocation package
Anthropic
San Francisco, CA
7 hours ago
Client Deployment Lead, AI (USA market)
...media footprints in the world. Our remote‑first team,... ...has grown by solving — at real scale — the exact... ...brand new business and AI platform for enterprise... ...sits across a company's systems, runs one shared, org‑wide... ...seeking a Client Deployment Lead, AI to embed on‑site alongside...
Interim role
Remote work
Work from home
Worldwide
Flexible hours
Shift work
TheSoul Group
San Francisco, CA
1 day ago

Do you want to receive more vacancies?

Subscribe and receive similar vacancies to AI Evaluation Lead: Real-World Systems Benchmarking. Be the first to apply!