AI Model Evaluation Leader — Data Quality
Twelve-Labs
Twelve-Labs in San Francisco is seeking a dedicated member for our ML Data Team to lead video data preparation and evaluation. This role includes defining dataset needs, automating processes, and enhancing data quality through collaboration. Ideal candidates should have over 5 years of experience in AI data operations, proficiency in Python, and strong communication skills. You will work in a flexible, inclusive environment focused on innovative AI technologies with benefits including full health coverage and flexible PTO. #J-18808-Ljbffr Twelve-Labs
- A leading AI solutions company in San Francisco is seeking an ML Eval Engineer to design evaluation benchmarks and improve model performance. This role involves working with unstructured enterprise data and collaborating closely with the ML and engineering teams. You will...DataQuality
- A cutting-edge AI company located in San Francisco is seeking an ML Eval Engineer to enhance model evaluations and ensure quality metrics. This role involves designing benchmarks, collaborating... ...problems, and a background in AI or data infrastructure. The position is in-...DataQuality
- Welocalize is seeking a Data Quality Associate to evaluate AI model outputs and provide structured feedback. This is a full-time, onsite role located in San Francisco. The ideal candidate possesses a Bachelor's degree and has 1-2 years of professional writing experience...DataQualityFull time
$34 per hour
Overview Welo Data is looking for sharp, curious, and detail-... ...individuals to join our team as Data Quality Associate. This is not a... ...directly with cutting-edge AI systems — evaluating outputs, identifying gaps,... ...of data quality, model evaluation, and human judgment...DataQualityFull timeContract workRemote workVisa sponsorship$240.45k - $300.3k
...Machine Learning Engineer - Model Evaluations, Public Sector The... ...deploys advanced AI systems-including LLMs... ...regression testing, and quality assurance for ML... ...Background in algorithms, data structures, and object... ...closely with industry leaders like Meta, Ernst & Young...DataQualityFull time- ...multimodal foundation models that have the ability to... ...Ventures, and prominent AI visionaries and... ...vital member of our ML Data Team - which leads the... ...preparation and model evaluation. This role comes with high... ...partnership, annotation, and quality evaluation work as possible...DataQualityWork at officeWorldwideFlexible hours
- Welocalize is seeking a Data Quality Associate based in San Francisco for a full-time position. This role involves evaluating AI outputs and providing detailed feedback, with applicants needing native-level language proficiency and a university degree. Successful candidates...DataQualityFull time
$25 per hour
Prolific is seeking AI Training Experts to assist in training and evaluating cutting-edge AI models. The role involves completing tasks such as analyzing and writing... ...home. Prolific creates a global pool for quality human data, connecting researchers with quality participants...DataQualityRemote jobHourly payWork from homeFlexible hours- ...Francisco is seeking an experienced data operations professional for... ...-language data preparation, model evaluation, and requires strong skills... ...should have over 5 years in AI data operations, the ability... ...commitment to ensuring high-quality data. The position includes benefits...DataQualityFlexible hours
$180k - $270k
...the world’s most trusted AI work companion for... ...the highest standards of data security and privacy protection... ..., data pipelines, or evaluation harnesses that can run at scale against live model checkpoints. Can deeply... ...transcription accuracy, audio quality, and reasoning of audio...DataQualityFull timeWork at officeWorldwide- Software Engineer (Model Evaluation & Benchmarking) About the Role We are hiring Engineers focused on AI Model Evaluation to build the systems... ...measure realism, consistency, and quality across image, video, and... ...Python, or similar). Strong data structures and algorithms...DataQuality
- ...San Francisco is seeking an innovative Quality Engineer for their AI products. This role blends ops,... ...the AI engineering team, you will use data to shape how AI behaves, work with partners... ...user satisfaction through effective evaluation baselines. Competitive salary and...DataQuality
$93.6k - $220.4k
...Safety (T&S) Responsible AI Policy team's mission... ...the development of GenAI models and applications are... ...product, engineering, data science, operations, red... ...drive end-to-end policy to evaluate workflows for your... ...policy and evaluation quality over time. Identify emerging...DataQualityTemporary workLocal area- ...candidate with a PhD in chemistry to design tasks and workflows evaluating scientific reasoning. Ideal candidates will have strong... ...is a plus. This role is crucial for improving data quality and model evaluation in a collaborative environment. #J-18808-Ljbffr...DataQuality
- ...the next generation of AI systems reason about flight... ...to strengthen model reasoning and technical... ...simulation outputs, test data, or performance models.... ...training, annotation, or evaluating AI‑generated technical... ...assumptions, and reasoning quality. Challenge advanced...DataQualityRemote jobFor contractors
- ...how the next generation of AI systems understand construction... ...work. You’ll challenge and evaluate advanced language models on construction engineering... ...experience with AI data training, annotation, or evaluating... ..., cost, schedule, safety, quality, and documentation. Review...DataQualityFor contractorsRemote work
- ...YO IT Consulting is seeking a Senior Data Architect to contribute to how AI systems reason about complex enterprise data. This remote... ...with cloud platforms. Responsibilities include evaluating AI responses, refining models, and providing structured feedback on data architecture...DataRemote work
$320k
Software Engineer (Model Quality), Claude Code San Francisco, CA or New... ...tooling, infrastructure, and evaluations. You'll build systems that help... ...scale Develop pipelines for data collection, processing, and... ...boundary between engineering and AI research Have at least 5...DataQualityWork experience placementVisa sponsorship- ...experienced Product Manager to drive the Trip Quality Merchandising product strategy. The... ...and a profound understanding of data analysis and AI/ML features. In this role, you will collaborate... ..., leading to improved listing quality evaluations. This position requires strong...DataQuality
- ...lead the team of all-star AI researchers and... ...responsible for developing the models that drive our products... .... Own the data, training, and eval pipelines... ...iteration velocity. Design evaluations and improve the... ...frontier of speed and quality. Work closely with engineering...DataQuality
- Role Overview We’re hiring a Model Performance Engineer to own... ...that makes the rest of the AI team faster. This is not a... ...x speedup with less than 1% quality degradation. Evaluate serving frameworks (vLLM vs... ...frameworks, understanding of data formatting, learning rate schedules...DataQuality
$148.5k - $266.2k
...Learning Engineering Manager, Model Delivery page is... ...deployment, monitoring, evaluation, reliability, and... ...generative models and other AI capabilities used across... ...tests to prevent quality regressions* Lead reliability... ...)* Experience with 3D data (geometry/CAD/BIM) and/...DataQualityRemote work- TL;DR We're hiring a Leader for our AI / ML / Data Science team (US, California Bay... ...and domain SMEs, ship models to production, and evolve... ...processes, frameworks and quality standards (design/code reviews... ...feature/representation layers, evaluation harnesses, and monitoring...DataQualityFull timeLocal area
$232.5k - $325.5k
...lead teams building AI-assisted... ...partner with engineering leaders in infrastructure,... ...enough to protect quality, and measurable enough... ...for LLM access and evaluation, safe agent workflows... ...evolve the operating model for AI-native... ...with engineering and data teams to define metrics...DataQualityFor contractorsWork experience placementFlexible hoursShift work$216k - $270k
...integrate and optimize models for production and... ...ensure a fair and thorough evaluation of all applicants.... ...is to develop reliable AI systems for the world'... ...products provide the high-quality data and full-stack... ...closely with industry leaders like Meta, Ernst & Young...DataQualityFull time- Welo Data is seeking a Data Labeling Associate to evaluate AI model outputs and improve data quality. The role requires native-level Canadian English proficiency and a relevant degree, offering a full-time, onsite position in cities including San Francisco and NYC. Responsibilities...DataQualityFull time
$20 per hour
...creative and technical talent with leading AI research labs. Headquartered in San... ...external tools . Generate high-quality human evaluation data by identifying response strengths,... ...completeness of responses. Ensure model responses align with expected conversational...DataQualityRemote jobContract workPart timeSummer work$144k - $187k
...s quantitative risk and factor model research. The team develops the... ...governance, and deliver production-quality outputs at scale. Your Key... ...infrastructure that integrates AI to accelerate model development... ...research, engineering, and data teams Presenting complex systems...DataQualityFlexible hours$70 - $100 per hour
...Mercor is seeking Data Science Experts to guide research teams... ...data science or statistical modeling and strong written communication... ...Key responsibilities include evaluating AI-produced solutions, designing... ...with experts to ensure data quality. The compensation ranges from...DataQualityHourly payContract workRemote work$192k - $260k
...are passionate about enabling data teams to solve the world's toughest... ...the world's best data and AI infrastructure platform so our... ...business. Foundation Model Serving is the API Product for... ...Establish best practices for code quality, testing, and operational readiness...DataQualityLocal areaWorldwide
Do you want to receive more vacancies?
Subscribe and receive similar vacancies to AI Model Evaluation Leader — Data Quality. Be the first to apply!
- underwriting team lead San Francisco, CA
- group finance manager San Francisco, CA
- office team lead San Francisco, CA
- team leader San Francisco, CA
- team lead data science San Francisco, CA
- disability team leader San Francisco, CA
- group operations director San Francisco, CA
- school leader San Francisco, CA
- leader San Francisco, CA
- remote team lead San Francisco, CA


