AI Evaluation Lead: Real-World Systems Benchmarking
SupportFinity™
A cutting-edge AI technology firm in San Francisco is seeking an Evaluation Lead to drive the assessment of AI model performance. You will design evaluation methodologies, automate evaluation processes, and oversee various evaluation strategies. The ideal candidate has extensive experience in AI model evaluation and is proficient in Python. This high-impact role demands strong collaboration skills and a startup-ready mindset to thrive in a fast-paced environment. #J-18808-Ljbffr SupportFinity™
- ...Ando is building AI-native workforce infrastructure... ...is rebuilding this system from first... .... We are live with real customers and... ...Principal AI / ML Systems Lead to serve as Ando's... ...operate under real-world uncertainty Set... ...for model evaluation, accuracy, and learning...SuggestedHourly payContract workShift work
$146.2k - $261.4k
...Research Lead - AI Cyber Testing & Evaluation RAND's Center on AI, Security, and Technology... .... Your team will build systems to evaluate how AI models... ...may include developing benchmarks for fully autonomous operations... ...AI policy across the world. Your team will communicate...SuggestedWork experience placementRemote workWork from home$130k - $155k
Lead, Frontier AI Systems, Centre for AI Excellence page is loaded## Lead, Frontier... ...authorization support.The World Economic Forum, committed... ...their responsible adoption and real-world impact through a... ...system-centric frameworks for evaluation, assurance, and deployment,...SuggestedRelocation packageShift work3 days per week- ...knowledge. They can lead unlimited,... ...looking for an AI Research Lead to... ...thousands of real buyer interactions... ...from real‑world agent interactions... ...research team. Evaluate new model architectures... ...‑grade systems. Strong product... ...tuning on synthetic benchmarks — you’ll be...SuggestedFull timeLive inRelocationVisa sponsorship
$117.2k - $313.7k
...Salesforce is the #1 AI CRM, where humans... ...a way of life. The world of work as we know... ...career at the company leading workforce... ...Salesforce. Distributed Systems Software Engineer -... ...proficiency with solving real-world data... ...reliably. Critically evaluate code (Human or AI-...Suggested- ...Factory Systems Integrators Partnerships Lead Our mission is to bring autonomy to software... ...agents that accelerate the world's largest enterprise... ...being written. You'll have real influence over how we engage... ...to deliver transformative AI-powered development solutions...
$185k - $225k
...knowledge. They can lead unlimited,... ...who is analytical, systems‑minded, and thrives... ...re passionate about AI, data visualization... ...Opportunity to build a world‑class function from... ...services to support real‑time, multimodal interactions... ...tests and evaluation frameworks to ensure...Full timeContract workRemote work$300 per month
...vertically integrated AI infrastructure... ...tokens — to power the world's most ambitious AI... ...to create industry-leading technical content.... ...work while giving you real ownership over... ...care about, how they evaluate tools, and what content... ...at Crusoe Energy Systems in San Francisco,...Temporary workFor contractorsDay shift- ...Co. is an applied AI startup co-founded... ...Gil, and backed by leading Silicon Valley builders... ...for the world’s most important institutions... ...impact on real-world problems across... ...governments, healthcare systems, and critical industries... ..., or technical evaluation processes. Skills...Work at officeRelocation3 days per week
$96.8k - $135k
Job Overview: Real Estate Manager - DSD Infrastructure... ...in identifying and evaluating real estate... ...(NASDAQ: KDP) is a leading beverage company in... ...business model and world-class brand... ...serve coffee brewing system in North America at... ...with our open roles.AI does not make hiring...Work at officeRelocation- Turing is seeking a licensed physician to work on evaluating AI systems in clinical settings. This role involves designing evaluation methods for AI performance on medical problems and engaging in research collaborations to enhance AI capabilities. Ideal candidates will...Remote jobFlexible hours
$167.3k - $261.4k
...Term) RAND’s Center on AI, Security, and Technology... ...as Senior Research Lead - AI Security Portfolio... ...on securing advanced AI systems, understanding their cyber... ..., cyber capability evaluation, and infrastructure development... ...‑edge research with real‑world policy impact....Fixed term contractRemote workWork from home- About Fractional AI How do you turn a decades... ...into an industry-leading medical coding... ...getting complex AI systems built right, with strong... .... M&A. Source and evaluate acquisition targets... ...translate into real capabilities. What... ...reputation as the world\'s best applied AI...
- ...Kana is an agentic AI platform for marketers... ...in an AI-driven world — using synthetic data... .... This role leads that team of AI Solutions... ...identify and the systems you design will directly... ...standards, quality benchmarks for AI‑generated... ...refined through real engagements, and early...
- RTI International is seeking a Health Outcomes Researcher to support real-world evidence and observational research studies. The successful candidate will manage study operations, timelines, and budgets, ensuring alignment with scientific and regulatory standards. The...
- ...watched person in the world. Renowned for revolutionizing... ...are hiring our first AI Enablement Lead to drive how AI is... ..., data architecting, evaluation, and basic deployment.... .... Strong product and systems thinking. You are good... ...influences culture in real time. This is your...Relocation packageFlexible hours
$10 per hour
...pivotal role in how goods move around the world. We are proud to have the support of... ...of every freight decision. You will lead the Autonomous Freight Systems team, owning the systems that power... ...to a tech-run one. You will lead an AI-first engineering team tasked with automating...$225k - $320k
Backed by leading Silicon Valley investors, Peregrine... ...and accuracy. Our AI‑enabled platform... ...is applied, evaluated, and operationalized... ...platform Translate real operational problems... ...time and batch data systems Ensure models are... ...that reflect real‑world decision impact, not...Local area- About Archetype AI Archetype AI is developing the world's first AI platform to bring AI into the real world. Formed by an exceptionally high... ...AI is seeking a hands‑on Evaluation Lead to build and assess model... ...tracking competitive industry benchmarks. This is a high‑impact...
- ...Intelligence is the open platform for evaluating how AI models perform in the real world. Created by researchers from UC... ...month to explore how frontier systems perform — and we use our community... ...-centered model evaluations. Leading enterprises and AI labs rely on our...Permanent employmentWork at office
$186.1k - $300.55k
...disconnected from business systems of record, costing... ...you'll do As a Lead Product Manager for Agentic AI Platform, you will... ...and AI Agent evaluation within the Intelligent... ...to measure, benchmark, and improve agentic... ...trust and making the world more agreeable for...Contract workWork at officeLocal areaRemote workShift work2 days per week$150k - $170k
...Individually, our AI and Superhumans are... ...center — combining the world’s smartest, auto-... ...for an Innovation Lead, Office of the CEO... ...AI capability into real, measurable... ...across the tools and systems the business uses—and... ...are applying AI Evaluate third-party tools,...Work at office$357k
...applications, processes, and AI into a single,... ...process to power real-time orchestration... ...companies in the world Deloitte Tech Fast... ...an exceptional Lead AI Research Scientist... ...of enterprise AI systems. This is a research... ...graphs, and agent evaluation frameworks, while...Work at officeRemote workFlexible hours- About Rad AI At Rad AI, we’re on a mission... ...datasets in the world, our AI has helped... ...groups and healthcare systems and nearly 50% of all... ...that make a real impact. Most recently... ...Why Join Us: As a Lead Product Manager for... ...Text, including model evaluation, performance monitoring...Full timeRemote workFlexible hours
$180k - $225k
Lead Product Manager, AI Responsibilities Own the product roadmap... ...capabilities, including model evaluation, performance... ...security teams to ensure AI systems meet healthcare... ...and driven by solving real clinical problems and... ...application to real-world workflows. Previous...Flexible hours$170k - $190k
...semiconductor industry, critical AI infrastructure, and the broader systems that power our world. We work as one team... ...and running a real-time community platform... ...product feedback, hiring leads, and an honest read on... ...and scaling over time Evaluate and recommend sponsorships...Full time- ...Manager Responsibilities: Lead the data quality evaluation by investigating all... ...potential impact on study systems setup, study conduction, or... ...approach. Knowledge of Real-World data sources and processes... ...Artificial Intelligence (AI). Project Management skills...For contractors
$240k - $300k
...Cobalt AI is revolutionizing physical safety through... ...platform that provides real-time, human-verified... ...and edge-deployed systems. We are looking for... ...team output. You will lead our current high-caliber... ...Velocity: Spearhead the evaluation and rollout of AI-driven...Work at officeLocal areaRemote work$180.8k - $226k
...frontier of GenAI and human-AI collaboration. The Gen... .... You will act as the lead investigative analyst... ..., and define offline evaluation frameworks (e.g.,... ...to develop reliable AI systems for the world's most important decisions... ...that deliver real impact. We work closely...Full timeShift work$150k - $225k
...Job Description Job Title: Functional Systems Lead Location: Burlingame, CA Department... ...in scaling gigawatt-level innovation at world-class companies such as Tesla, Northvolt,... ...embedded teams to guide development of real-time, safety-critical firmware (controls...Full timeFlexible hours
Do you want to receive more vacancies?
Subscribe and receive similar vacancies to AI Evaluation Lead: Real-World Systems Benchmarking. Be the first to apply!


