AI Evaluation Lead: Real-World Systems Benchmarking

SupportFinity™

A cutting-edge AI technology firm in San Francisco is seeking an Evaluation Lead to drive the assessment of AI model performance. You will design evaluation methodologies, automate evaluation processes, and oversee various evaluation strategies. The ideal candidate has extensive experience in AI model evaluation and is proficient in Python. This high-impact role demands strong collaboration skills and a startup-ready mindset to thrive in a fast-paced environment. #J-18808-Ljbffr SupportFinity™

Apply

Vacancy posted 4 days ago

Similar jobs that could be interesting for youBased on the AI Evaluation Lead: Real-World Systems Benchmarking in San Francisco, CA vacancy

Senior AI/ML Lead — Real‑World Underwriting & AI Stack
RentFlow (YC S24) is looking for an AI/ML Lead to own underwriting, cash‑flow intelligence, and data insights end‑to‑end. You will model messy, real‑world SMB cash flows and build decisioning systems while leveraging LLMs for unique insights. The ideal candidate will enjoy...
Suggested
RentFlow (YC S24)
San Francisco, CA
1 day ago
Research Lead - AI Cyber Testing & Evaluation
$146.2k - $261.4k
...Description RAND’s Center on AI, Security, and... ...analysis projects, and leading multidisciplinary teams... ...Your team will build systems to evaluate how AI models perform... ...may include developing benchmarks for fully autonomous operations... ...AI policy across the world. Your team will...
Suggested
Fixed term contract
Work experience placement
Remote work
Work from home
Dormont Manufacturing Company
San Francisco, CA
4 days ago
Lead, Frontier AI Systems, Centre for AI Excellence
$130k - $155k
...authorization support. The World Economic Forum,... .... The Centre for AI Excellence (CAIE) is... ...responsible adoption and real‑world impact through... ...CAIE is looking for a Lead for its Frontier AI Systems & Capabilities... ...centric frameworks for evaluation, assurance, and deployment...
Suggested
Relocation package
Shift work
3 days per week
Dormont Manufacturing Co
San Francisco, CA
2 days ago
Evaluations Lead AI Model Progress & Metrics
Cartesia is looking for an Evaluations Lead to design frameworks that measure AI model interactions, focusing on understanding... ..., and adaptability in real-world settings. The role blends research... ...to create impactful evaluation systems. Join our in-person team in San...
Suggested
Cartesia, Inc.
San Francisco, CA
3 days ago
Frontier AI Systems Lead — Strategy, Safety & Impact (SF)
The World Economic Forum is looking for a Lead for its Frontier AI Systems & Capabilities workstream in San Francisco. This role will guide advancements in AI through... ...This position offers a unique opportunity to shape real-world applications and community engagement...
Suggested
World Economic Forum
San Francisco, CA
1 day ago
Document AI Research Lead - San Francisco, CA - $200K-$350K
$200k - $350k
...About the job Document AI Research Lead - San Francisco, CA - $200K-$3... ...building advanced vision-language systems for understanding... ...Build data pipelines and evaluation frameworks to improve model... ...engineering to bring models into real-world applications Influence product...
Direct Line Workforce Solutions
San Francisco, CA
4 days ago
Real Estate Strategy Lead - Warehouse & DSD
$96.8k - $135k
Job Overview: Real Estate Manager - DSD Infrastructure... ...in identifying and evaluating real estate... ...(NASDAQ: KDP) is a leading beverage company in... ...business model and world-class brand... ...serve coffee brewing system in North America at... ...with our open roles.AI does not make hiring...
Work at office
Relocation
Keurig Dr Pepper
San Francisco, CA
22 hours ago
AI Enablement Lead
...unlock liquidity for the world. Backed by leading investors like PayPal... ...toward a future where AI is embedded into how... ...engineering translating real operational pain... ...company, building the systems, standards, and internal... ...Standardize how we build, evaluate, and scale AI...
Work at office
Remote work
2 days per week
B Capital
San Francisco, CA
3 days ago
Lead Technical Content Marketing for AI Infrastructure
$300 per month
...vertically integrated AI infrastructure... ...tokens — to power the world's most ambitious AI... ...to create industry-leading technical content.... ...work while giving you real ownership over... ...care about, how they evaluate tools, and what content... ...at Crusoe Energy Systems in San Francisco,...
Temporary work
For contractors
Day shift
Crusoe Energy Systems
San Francisco, CA
22 hours ago
IT Lead
$170k - $220k
About Distyl AI Distyl is an applied AI technology... ...with the world’s most ambitious institutions... ...self-constructing systems, the development of... ...is backed by leading investors including... ...AI tools, and real‑world business problems... ...employer and evaluate all applicants without...
Work at office
Remote work
Distyl AI, Inc.
San Francisco, CA
3 days ago
AI Enablement Lead
...watched person in the world. Renowned for revolutionizing... ...are hiring our first AI Enablement Lead to drive how AI is... ..., data architecting, evaluation, and basic deployment.... .... Strong product and systems thinking. You are good... ...influences culture in real time. This is your...
Relocation package
Flexible hours
MrBeast
San Francisco, CA
3 days ago
Evaluation Lead
About Archetype AI Archetype AI is developing the world's first AI platform to bring AI into the real world. Formed by an exceptionally high... ...AI is seeking a hands‑on Evaluation Lead to build and assess model... ...tracking competitive industry benchmarks. This is a high‑impact...
SupportFinity™
San Francisco, CA
4 days ago
Senior Research Lead, AI Security Portfolio
$167.3k - $261.4k
Senior Research Lead - AI Security Portfolio page is loaded## Senior... ...on securing advanced AI systems, understanding their cyber capabilities... ...research, cyber capability evaluation, and infrastructure... ...cutting-edge research with real-world policy impact.**Qualifications...
Fixed term contract
Work experience placement
Remote work
Work from home
RAND Corporation
San Francisco, CA
4 days ago
Robotics Benchmarking Lead — Member of Technical Staff
...seeking a Member of Technical Staff to lead its new robotics vertical. This role entails defining benchmarking methodologies and producing leaderboards to evaluate robotics capabilities. Ideal... ...will possess a strong interest in AI and robotics, alongside robust coding...
Artificial Analysis, Inc.
San Francisco, CA
22 hours ago
Lead Clinical Data Manager
...Clinical Data Manager Lead the data quality evaluation by investigating all clinical... ...potential impact on study systems setup, study conduction,... ...approach. Knowledge of Real-World data sources and processes... ...Artificial Intelligence (AI). Project Management skills...
For contractors
Katalyst HealthCares & Life Sciences
San Francisco, CA
4 days ago
Lead AI Strategist
...300 products for the world's leading startups, enterprises,... ...that engine at agentic AI, and we are looking for... ...them to make it real. As an AI Strategist... ...specifically agentic systems, create real business... ...technologies, enabling you to evaluate technical feasibility...
Remote work
Symphony Group
San Francisco, CA
1 day ago
Distillation Lead
$195k - $286k
...Distillation Lead Waabi, founded by AI visionary Raquel Urtasun, is the... ...Physical AI. With a world-class team, we're unlocking... ...requirements of both real-time onboard systems and high-throughput... ...contexts. Define rigorous benchmarks and evaluation frameworks to...
Full time
Work at office
Remote work
Work from home
Flexible hours
Waabi
San Francisco, CA
4 days ago
Remote Lead Financial Analyst - AI Trainer ($50-$60 per hour)
$50 - $60 per hour
...DataAnnotation is committed to creating high-quality AI. Enjoy the flexibility of remote work and... ...'s work Push the models with complex, real-world scenarios and edge cases to see where... ...diverse and complex problems and evaluate their outputs Evaluate the quality produced...
Hourly pay
Contract work
For contractors
Work experience placement
Remote work
Data Annotation
Daly City, CA
more than 2 months ago
Safety Systems TPM: AI Risk & Infra Lead
A leading AI research company based in San Francisco is seeking a Technical Program Manager to manage safety system integrations and drive risk mitigation for its models. The ideal candidate is technically skilled and has a solid background in managing complex projects,...
Work at office
Relocation package
3 days per week
OpenAI
San Francisco, CA
3 days ago
Content Lead
$210k - $220k
...engineering in the real world, helping... ...production-ready AI agents that teams... ...platform for building, evaluating, deploying, and operating... ...IVP, Sequoia, Benchmark, CapitalG, and Sapphire... ...for a Content Lead to build and scale... ...and build scalable systems: Establish and evolve...
Work at office
Remote work
Flexible hours
Langchain
San Francisco, CA
3 days ago
AI Data Center Real Estate Strategy Lead
Crusoe in San Francisco is looking for a Real Estate Strategy Manager to oversee the end-to-end acquisition and development lifecycle for data center projects. You will architect site selection strategies and manage relationships with external stakeholders. The ideal candidate...
Crusoe
San Francisco, CA
1 day ago
Lead, AI-Powered GTM Systems
Antler is looking for a GTM Engineer Lead to design and execute AI-powered GTM systems from scratch. This role involves owning the strategy for agent orchestration and building efficient data pipelines to facilitate growth. The candidate should possess strong experience...
Antler Ltd
San Francisco, CA
3 days ago
Evaluations Partnerships Lead - Youth AI Safety
$90k - $110k
Common Sense Media is seeking an Evaluations Partner Manager in San Francisco, California. This role involves managing the operational execution of the Youth AI Safety Institute's evaluation work, with responsibilities such as coordinating workflow between internal teams...
Full time
Common Sense Media
San Francisco, CA
2 days ago
AI Policy & Stakeholder Outreach Lead
$230k - $260k
As Policy Outreach Lead, you will build relationships with external... ...outreach to drive progress in AI policy and AI safety. You will... ..., and steerable AI systems. We want AI to be safe and beneficial... ...creating compelling narratives and real-world examples to support your...
Work at office
Local area
Home office
Relocation package
Anthropic
San Francisco, CA
22 hours ago
Vertical AI Lead
$150k - $250k
About Haize Labs Today’s AI has a jagged intelligence surface.... ...is superhuman on well-defined benchmarks like the International Math Olympiad... ..., but falls short in many real world domains. Haize Labs brings... .... We are hiring a Vertical AI Lead to “own the P&L” of our agent...
Visa sponsorship
Haizelabs
San Francisco, CA
4 days ago
Engagement Lead, AI Strategy
...People are increasingly trusting AI to help them buy things. First... ...buyers: agents that shop, evaluate, and transact on behalf of humans... ...a company has put into the world (often inadvertently). Companies... ..., and contribute back to the systems your teammates rely on...
Shift work
Unusual
San Francisco, CA
2 days ago
AI Systems & Innovation Lead (Nonprofit Tech)
Kai Ming, Inc. is looking for an AI Systems and Innovation Manager in San Francisco. This position involves leading the upgrade of internal technology systems and developing AI tools to enhance operational efficiency within the nonprofit sector. The ideal candidate should...
Kai Ming, Inc.
San Francisco, CA
22 hours ago
System Card Lead - Research Operations for AI Safety
$260k - $310k
Menlo Ventures is looking for a Research Operations Specialist in San Francisco to oversee system card production for AI models. This role involves coordinating contributions from multiple teams, editing for clarity and consistency, and ensuring the integrity of complex...
Menlo Ventures
San Francisco, CA
22 hours ago
AI Automation Senior Lead
...clients in communities around the world. Taskrabbit is a hybrid... ...looking for a builder to help lead our ‘AI for Work’ efforts. Together with... ...efficient. That means building, evaluating vendors, and continuously evolving the AI systems our teams run on with the goal...
Hourly pay
Work at office
Shift work
Gravity Engineering Services Pvt Ltd.
San Francisco, CA
1 day ago
AI Innovation Lead
$150k - $170k
AI Innovation Lead Location: Preferred San Francisco - hybrid schedule, remote... ...scalable, and measurable. Evaluate workflows for automation... ...CRM platforms, or operational systems. Integrate AI into team planning... ...experience applying AI to real workflows (not just...
Full time
Contract work
Work at office
Remote work
Crescendo
San Francisco, CA
1 day ago

Do you want to receive more vacancies?

Subscribe and receive similar vacancies to AI Evaluation Lead: Real-World Systems Benchmarking. Be the first to apply!