AI Evaluation Lead: Real-World Systems Benchmarking
SupportFinity
A cutting-edge AI technology firm in San Francisco is seeking an Evaluation Lead to drive the assessment of AI model performance. You will design evaluation methodologies, automate evaluation processes, and oversee various evaluation strategies. The ideal candidate has extensive experience in AI model evaluation and is proficient in Python. This high-impact role demands strong collaboration skills and a startup-ready mindset to thrive in a fast-paced environment. #J-18808-Ljbffr
- RentFlow, based in San Francisco, is seeking an AI/ML lead to own underwriting and cash-flow insights. This foundational... ...and business outcomes. Ideal candidates have experience with ML systems, LLMs, and a passion for real-world impact. #J-18808-Ljbffr RentFlowSuggested
- RentFlow (YC S24) is looking for an AI/ML Lead to own underwriting, cash‑flow intelligence, and data insights end‑to‑end. You will model messy, real‑world SMB cash flows and build decisioning systems while leveraging LLMs for unique insights. The ideal candidate will enjoy...Suggested
- ...Cartesia is seeking an Evaluations Lead in San Francisco to design evaluation frameworks for AI models. This role involves defining key model capabilities, developing... ...will have a background in creating evaluation systems for generative models and strong technical skills...Suggested
$146.2k - $261.4k
...Research Lead - AI Cyber Testing & Evaluation RAND's Center on AI, Security, and Technology... .... Your team will build systems to evaluate how AI models... ...may include developing benchmarks for fully autonomous operations... ...AI policy across the world. Your team will communicate...SuggestedWork experience placementRemote workWork from home$140k - $160k
...authorization support. The World Economic Forum,... .... The Centre for AI Excellence (CAIE) is... ...responsible adoption and real-world impact through... ...CAIE is looking for a Lead for its Frontier AI Systems & Capabilities... ...centric frameworks for evaluation, assurance, and deployment...SuggestedRelocation packageShift work3 days per week- ...knowledge. They can lead unlimited,... ...looking for an AI Research Lead to... ...thousands of real buyer interactions... ...from real‑world agent interactions... ...research team. Evaluate new model... ...production‑grade systems. Strong product... ...tuning on synthetic benchmarks — you’ll be...Full timeLive inRelocationVisa sponsorship
- The World Economic Forum is looking for a Lead for its Frontier AI Systems & Capabilities workstream in San Francisco. This role will guide advancements in AI through... ...This position offers a unique opportunity to shape real-world applications and community engagement...
$117.2k - $313.7k
...Salesforce is the #1 AI CRM, where humans... ...a way of life. The world of work as we know... ...career at the company leading workforce... ...Salesforce. Distributed Systems Software Engineer -... ...proficiency with solving real-world data... ...reliably. Critically evaluate code (Human or AI-...- ...Systems Integrators Partnerships Lead Our mission is to bring autonomy to software engineering... ...agents that accelerate the world's largest enterprise... ...being written. You'll have real influence over how we engage... ...to deliver transformative AI-powered development solutions...
- ...Clera is seeking a skilled individual to evaluate medical imaging AI systems, ensuring their reliability and regulatory compliance. You will lead customer interactions from defining evaluation questions to delivering informative reports that guide go/no-go decisions....
- ...About Archetype AI Archetype AI is developing the world's first AI platform to bring AI into the real world. Formed by an exceptionally high... ...AI is seeking a hands‑on Evaluation Lead to build and assess model... ...tracking competitive industry benchmarks. This is a high‑impact...
$170k - $220k
...About Distyl AI Distyl is an applied AI... ...partnering with the world’s most ambitious institutions... ...self-constructing systems, the development of... ...is backed by leading investors including... ...AI tools, and real‑world business problems... ...employer and evaluate all applicants without...Work at officeRemote work$300 per month
...vertically integrated AI infrastructure... ...tokens — to power the world's most ambitious AI... ...to create industry-leading technical content.... ...work while giving you real ownership over... ...care about, how they evaluate tools, and what content... ...at Crusoe Energy Systems in San Francisco,...Temporary workFor contractorsDay shift- ...Co. is an applied AI startup co-founded... ...Gil, and backed by leading Silicon Valley builders... ...for the world’s most important institutions... ...impact on real-world problems across... ...governments, healthcare systems, and critical industries... ..., or technical evaluation processes. Skills &...Work at officeRelocation3 days per week
- ...consulting Industry. AI Transformation will... .... You’ll build the systems, the AI workflows,... ...Stripe to deliver real AI outcomes. Traditional... ...companies in the world and skeptical of... ...interview processes, evaluation frameworks,... ...Level: Recruiting Lead (IC). Scope and team...Interim roleWork at officeLocal areaRelocation package
$96.8k - $135k
...Job Overview: Real Estate Manager – DSD Infrastructure... ...in identifying and evaluating real estate... ...(NASDAQ: KDP) is a leading beverage company in... ...business model and world-class brand... ...serve coffee brewing system in North America at... ...with our open roles.AI does not make hiring...Work at officeRelocation$10 per hour
...pivotal role in how goods move around the world. We are proud to have the support of... ...of every freight decision. You will lead the Autonomous Freight Systems team, owning the systems that power... ...to a tech-run one. You will lead an AI-first engineering team tasked with automating...- ...Co. is an applied AI startup co-founded... ...Gil, and backed by leading Silicon Valley builders... ...for the world's most important institutions... ...impact on real-world problems across... ...governments, healthcare systems, and critical industries... ..., technical evaluations, or enterprise AI deployments...
$167.3k - $261.4k
...Term) RAND’s Center on AI, Security, and Technology... ...as Senior Research Lead - AI Security Portfolio... ...on securing advanced AI systems, understanding their cyber... ..., cyber capability evaluation, and infrastructure development... ...‑edge research with real‑world policy impact....Fixed term contractRemote workWork from home$167.3k - $261.4k
Senior Research Lead - AI Security Portfolio page is loaded## Senior... ...on securing advanced AI systems, understanding their cyber capabilities... ...research, cyber capability evaluation, and infrastructure... ...cutting-edge research with real-world policy impact.**Qualifications...Fixed term contractWork experience placementRemote workWork from home- RTI International is seeking a Health Outcomes Researcher to support real-world evidence and observational research studies. The successful candidate will manage study operations, timelines, and budgets, ensuring alignment with scientific and regulatory standards. The...
- ...watched person in the world. Renowned for revolutionizing... ...are hiring our first AI Enablement Lead to drive how AI is... ..., data architecting, evaluation, and basic deployment.... .... Strong product and systems thinking. You are good... ...influences culture in real time. This is your...Relocation packageFlexible hours
$300k - $320k
...About the role: We are seeking a Technical Program Manager to lead our AI model evaluation initiatives across multiple workstreams. This role will be... ...the opportunity to shape the development of advanced AI systems and contribute to Anthropic's mission of ensuring that AI...Work at officeHome officeVisa sponsorshipRelocation package$160k - $250k
...CLERA is a well-funded AI startup in mechanical engineering seeking a Staff Engineer – Agentic AI to lead the development of their core agent intelligence. This high-impact role requires 7+ years in software engineering and deep experience with LLM-based agents. The on...- ...seeking a Member of Technical Staff to lead its new robotics vertical. This role entails defining benchmarking methodologies and producing leaderboards to evaluate robotics capabilities. Ideal... ...will possess a strong interest in AI and robotics, alongside robust coding...
$186.1k - $300.55k
...disconnected from business systems of record, costing... ...you'll do As a Lead Product Manager for Agentic AI Platform, you will... ...and AI Agent evaluation within the Intelligent... ...to measure, benchmark, and improve agentic... ...trust and making the world more agreeable for...Contract workWork at officeLocal areaRemote workShift work2 days per week$180k - $225k
...Lead Product Manager, AI Responsibilities Own the product roadmap... ...capabilities, including model evaluation, performance... ...security teams to ensure AI systems meet healthcare... ...and driven by solving real clinical problems and... ...application to real-world workflows. Previous product...Flexible hours- ...build and orchestrate AI workforces. Our AI... ..., and enterprise systems. Born in Y Combinator... ...- where AI has real consequences. We started... ...and functions. Leading training and onboarding... ...the Best - Join a world-class team of... ...for the purpose of evaluating and selecting you as...WorldwideShift work
$230k - $260k
...As Policy Outreach Lead, you will build relationships with external... ...outreach to drive progress in AI policy and AI safety. You will... ..., and steerable AI systems. We want AI to be safe and beneficial... ...creating compelling narratives and real-world examples to support your...Work at officeLocal areaHome officeRelocation package- ...media footprints in the world. Our remote‑first team,... ...has grown by solving — at real scale — the exact... ...brand new business and AI platform for enterprise... ...sits across a company's systems, runs one shared, org‑wide... ...seeking a Client Deployment Lead, AI to embed on‑site alongside...Interim roleRemote workWork from homeWorldwideFlexible hoursShift work
Do you want to receive more vacancies?
Subscribe and receive similar vacancies to AI Evaluation Lead: Real-World Systems Benchmarking. Be the first to apply!

