Model Evaluation & Data Quality Lead

Twelve-Labs

Who We Are At Twelve Labs, we are pioneering the development of cutting‑edge multimodal foundation models that can comprehend videos just like humans do. Our models have redefined the standards in video‑language modeling, empowering us with more intuitive and far‑reaching capabilities, and fundamentally transforming the way we interact with and analyze various forms of media. With a remarkable $107 million in Seed and Series A funding, our company is backed by top‑tier venture capital firms such as NVIDIA’s NVentures, NEA, Radical Ventures, and Index Ventures, and prominent AI visionaries and founders such as Fei‑Fei Li, Silvio Savarese, Alexandr Wang and more. Headquartered in San Francisco, with an influential APAC presence in Seoul, our global footprint underscores our commitment to driving worldwide innovation. We are a global company that values the uniqueness of each person’s journey. We are looking for individuals who are motivated by our mission and eager to make an impact as we push the bounds of technology to transform the world. Join us as we revolutionize video understanding and multimodal AI. About the Role You will be a vital member of our ML Data Team – which leads the full spectrum of video‑language data preparation and model evaluation. This role comes with high ownership and includes responsibilities such as defining dataset needs and requirements in consultation with our research and product teams, designing and building data pipelines, and driving our post‑training model evaluation strategy. You will also be responsible for automating as much of the repetitive partnership, annotation, and quality evaluation work as possible. A desire to work cross functionally and to build relationships is critical for success in this position. You Will Model Evaluation: Design and build robust model evaluation frameworks, automating repetitive processes and maintaining a balanced approach to efficiency and depth to obtain evaluation metrics and feedback. Portfolio Monitoring: Manage resource allocation and timelines, adjusting direction flexibly based on real‑time information across all data streams in your product vertical. External Partner Collaboration: Enhance dataset and process quality through seamless collaboration with vendors and outsourcing partners. Data Quality & Tooling Advancement: Establish labeling guidelines, monitor data quality, and improve tools and infrastructure to build a sustainable data operations framework. Internal Collaboration: Partner with Engineering and AI Model teams to align on top priority data needs, design tools such as analytical reports and dashboards, and clearly communicate project progress. You May Be a Good Fit If You Have 5+ years of experience working in an AI focused data operations organization. A proven track record designing and executing large‑scale data or evaluation projects, including gathering, labeling, and post‑processing data. The ability to analyze messy and complex data, identify overarching patterns, and distill your findings into crisp annotation guidelines or model quality reports. Proficiency with Python, LLMs, or other popular industry tools for automation. Excellent communication and project management skills, and the ability to support several projects simultaneously. A foundational understanding of and interest in LLMs/VLMs and multimodal AI. Conviction that data is the key ingredient for the performance and assessment of AI models. You’ll Stand Out If You Have Experience in data collection and labeling for multimodal language models. Experience in red teaming, localization testing, or other evaluation focused fields. Experience working with research scientists and engineers. Expertise or interest in video‑centric domains, such as sports, advertising, and content creation. Tech Stack Development & Analysis: Python (primarily pandas, Jupyter, etc.) Data Management & Visualization: Amazon S3, various data visualization tools (framework‑agnostic) Project Management Tools: Linear, Notion Even if there are a few checkboxes that aren’t ticked through your prior experience, we still encourage you to apply! If you are a 0‑1 achiever, a ferocious learner, and a kind and fun team player who motivates others, you will find a home at TwelveLabs. We are a global company that values the uniqueness of each person’s journey. We are looking for individuals who are motivated by our mission and eager to make an impact as we push the bounds of technology to transform the world. Benefits and Perks Open and inclusive culture and work environment. Work closely with a collaborative, mission‑driven team on cutting‑edge AI technology. Full health, dental, and vision benefits. Flexible PTO and parental leave policy. Office closed the week of Christmas and New Years. #J-18808-Ljbffr Twelve-Labs

Apply

Vacancy posted 4 days ago

Similar jobs that could be interesting for youBased on the Model Evaluation & Data Quality Lead in San Francisco, CA vacancy

AI Data & Model Evaluation Lead
...San Francisco is seeking an experienced data operations professional for their ML Data... ...on video-language data preparation, model evaluation, and requires strong skills in Python and... ...datasets, and a commitment to ensuring high-quality data. The position includes benefits...
Quality
Flexible hours
Twelve-Labs
San Francisco, CA
5 days ago
AI Evaluation & Data Ops Lead
TwelveLabs in San Francisco is seeking an experienced data operations professional to join their ML Data Team. The role involves designing robust model evaluation frameworks, managing project timelines, and enhancing data quality through collaboration with vendors. Ideal...
Quality
Flexible hours
TwelveLabs
San Francisco, CA
3 days ago
AI Model Evaluator & Data Quality Analyst
Welocalize is seeking a Data Quality Associate based in San Francisco for a full-time position. This role involves evaluating AI outputs and providing detailed feedback, with applicants needing native-level language proficiency and a university degree. Successful candidates...
Quality
Full time
Welocalize
San Francisco, CA
5 days ago
AI Data Quality & Model Evaluation Associate
Welocalize is seeking a Data Quality Associate to evaluate AI model outputs and provide structured feedback. This is a full-time, onsite role located in San Francisco. The ideal candidate possesses a Bachelor's degree and has 1-2 years of professional writing experience...
Quality
Full time
Welocalize
San Francisco, CA
5 days ago
Remote Odia LLM Analyst Model Evaluation
$15 - $20 per hour
...include fact-checking and generating high-quality human evaluation data for AI systems. Ideal candidates will... ...experience with large language models, and excellent writing skills in English... ...creative and technical talent with leading AI research labs. #J-18808-Ljbffr Mercor...
Quality
Remote job
Mercor Inc
San Francisco, CA
2 days ago
AI Model Evaluation Leader — Data Quality
...Labs in San Francisco is seeking a dedicated member for our ML Data Team to lead video data preparation and evaluation. This role includes defining dataset needs, automating processes, and enhancing data quality through collaboration. Ideal candidates should have over 5...
Quality
Flexible hours
Twelve-Labs
San Francisco, CA
4 days ago
ML Evaluation Engineer: Benchmark & Model Quality
A leading AI solutions company in San Francisco is seeking an ML Eval Engineer to design evaluation benchmarks and improve model performance. This role involves working with unstructured enterprise data and collaborating closely with the ML and engineering teams. You will...
Quality
Reducto
San Francisco, CA
5 days ago
ML Evaluation Engineer: Benchmark & Model Quality
...San Francisco is seeking an ML Eval Engineer to enhance model evaluations and ensure quality metrics. This role involves designing benchmarks, collaborating... ...for solving complex problems, and a background in AI or data infrastructure. The position is in-person and offers a...
Quality
Reducto, Inc.
San Francisco, CA
1 day ago
Machine Learning Engineer, Model Evaluations (Speech LLM) - San Francisco
$180k - $270k
...committed to the highest standards of data security and privacy protection.... ...systems, data pipelines, or evaluation harnesses that can run at scale against live model checkpoints. Can deeply partner... ..., transcription accuracy, audio quality, and reasoning of audio models....
Quality
Full time
Work at office
Worldwide
Plaud
San Francisco, CA
4 days ago
Remote AI Training Specialist: Model Tuning & Evaluation
$25 per hour
...seeking AI Training Experts to assist in training and evaluating cutting-edge AI models. The role involves completing tasks such as analyzing... ...can work from home. Prolific creates a global pool for quality human data, connecting researchers with quality participants. #J-...
Quality
Remote job
Hourly pay
Work from home
Flexible hours
Prolific
San Francisco, CA
1 day ago
Software Engineer (Model Evaluation & Benchmarking)
Software Engineer (Model Evaluation & Benchmarking) About the Role We are hiring Engineers focused... ...we measure realism, consistency, and quality across image, video, and multimodal AI systems... ...(C++, Java, Python, or similar). Strong data structures and algorithms fundamentals....
Quality
SpreeAI
San Francisco, CA
2 days ago
AI Model Behavior Engineer—Quality & Evaluation
...Francisco is seeking an innovative Quality Engineer for their AI products.... ...AI engineering team, you will use data to shape how AI behaves, work with partners in leading labs, and ensure user satisfaction through effective evaluation baselines. Competitive salary and...
Quality
Notion
San Francisco, CA
4 days ago
Data Quality Lead
...Team Leadership and Management: Lead, mentor, and grow a team of data annotators and quality analysts; ensure team members... ...set goals, and conduct regular evaluations. Strategic Oversight: Collaborate... ...of machine learning models, data annotation, and quality assurance...
Quality
Welocalize
San Francisco, CA
5 days ago
Python Cloud Data Automation Lead
Python Test Lead San Francisco, CA Contract Key Responsibilities:... ..., and maintain scalable cloud data architectures and automated solutions... .... Ensure automation of data quality, security, and governance... ...monitoring of data infrastructure. Evaluate and recommend tools and...
Quality
Contract work
US Staffing Inc
San Francisco, CA
5 days ago
Data Governance Lead
...We’re developing open weight models for individuals, agents, enterprises... ...dataset provenance, training-data summaries, DPIAs, and the... ...Reflection AI's training and evaluation data — so that every model we... ...labeler provenance, and data quality so we can satisfy auditors,...
Quality
Relocation package
B Capital
San Francisco, CA
2 days ago
Data Operations Lead
...About the Role As an Data Operations Lead , you will interface with AI and healthcare customers, scope requirements, and help construct high quality datasets, environments, and evaluations for AI models. Kinetic Systems works at the intersection of computer-use agents,...
Quality
Work experience placement
Gravity Engineering Services Pvt Ltd.
San Francisco, CA
2 days ago
Data Quality Team Lead (Human-in-the-Loop AI)
$43.5 per hour
...Job Description Job Description Overview Welo Data is looking for a Data Quality Lead to oversee teams of Data Quality Analysts supporting AI model evaluation and improvement. This role is equal parts people leadership, quality ownership, and operational thinking...
Quality
Full time
Contract work
Remote work
Welocalize
San Francisco, CA
5 days ago
Data Operations Lead
...built our own voice stack, models, and orchestration... ...hand with Engineering, Data Ops tackles two critical... ...required to build industry-leading AI audio models, and... ...and meet rigorous quality standards. Lead Labeling... ...data for the purpose of evaluating and selecting you as a...
Quality
Worldwide
Shift work
Happy Robot
San Francisco, CA
3 days ago
Senior Software Engineer - Model Performance
$220k - $320k
...Inference.net trains and hosts specialized language models for companies that need frontier‑quality AI at a fraction of the cost. The models we train match... ...everything end‑to‑end: distillation, training, evaluation, and planet‑scale hosting. We are a well‑funded ten‑...
Quality
Work at office
Inference
San Francisco, CA
1 day ago
Member of Technical Staff, Model Training
...own the training pipeline behind the models that power both Parallel’s search stack... ...and execute high‑value tasks over web data. You will build the path from real product usage to high‑quality training data, fine‑tune and evaluate these models rigorously, and ship them...
Quality
Work at office
Visa sponsorship
Parallel Web Systems
San Francisco, CA
2 days ago
Researcher: Model Architecture
...humans do. We're pioneering the model architectures that will make... .... We're funded by leading investors at Index Ventures and... ...architectures that improve model quality, inference efficiency, and adaptability... ...new frameworks and tools to evaluate architectural innovations,...
Quality
Work at office
Visa sponsorship
Flexible hours
Cartesia AI, Inc.
San Francisco, CA
4 days ago
Engineering Manager, Model Library
...frontier of AI to bring cutting‑edge models into production. We’re growing... ...AI products. THE ROLE You’ll lead the Model Library team at... ...helping developers discover, evaluate, and select the right models for... ...thinking through production‑quality execution. You’ll stay technically...
Quality
Flexible hours
Baseten
San Francisco, CA
5 days ago
Product Manager, Model Behavior at OpenAI San Francisco, CA
$245k - $310k
Product Manager, Model Behavior San Francisco, CA. About the Team The Model Behavior... ...methodologies, tools, and processes for evaluating, tuning, and iterating on model behavior... ...actionable metrics that accurately reflect model quality and user experience at scale. You might...
Quality
Work at office
Relocation package
kozmetickesluzby.vecnakraska.sk - Jobboard
San Francisco, CA
2 days ago
Model Behavior Engineer
$98k - $140k
...About The Role You’ll own the quality bar for Notion AI products... ...engineering, designing evaluation systems, and analyzing data. This team sits in our AI... ...that you'll shape Notion’s model strategy and work directly... ...launch new models with leading research labs — Evaluate and...
Quality
Live in
Work at office
Local area
Notion
San Francisco, CA
2 days ago
Strategy& - Strategy Consulting Business Model Reinvention - Manager
$99k - $232k
...Opportunity As a Strategy & Business Model Reinvention Manager, you will... .... As a Manager, you will lead teams and manage client... ...and inspiring others to deliver quality. You are expected to lead with... ...closely with team members. We evaluate these factors thoughtfully to...
Quality
Full time
H1b
PwC
San Francisco, CA
4 days ago
Frontier Data Engineer | AI Data Pipelines Lead
$140k - $200k
...to manage and enhance human data pipelines in San Francisco. You will work directly with leading AI labs like Google and Meta... ...create datasets crucial for model evaluation and training. This hybrid role... ...pipelines, and ensuring high data quality. A competitive salary range...
Quality
I did my part and supported the Regular Toilet
San Francisco, CA
5 days ago
Finance AI Model Evaluator - Contract, 20 hrs/week
$50 - $75 per hour
A leading tech company based in Australia is seeking an AI Model Evaluator on a contract basis. The role involves evaluating AI-generated responses, writing prompts, and providing justifications based on specific criteria. Ideal candidates will hold a Master's degree in...
Hourly pay
Contract work
Mercor
San Francisco, CA
2 days ago
Research Lead, Model Evaluation & Training Insights
Anthropic is seeking a Research Lead for the Training Insights team to shape the evaluation of model capabilities. This hands-on leadership role involves developing innovative evaluation methodologies and mentoring a team of researchers. You will play a crucial role in...
Remote work
Anthropic
San Francisco, CA
4 days ago
Member of Technical Staff, Model Evaluation
$350k
...are building a frontier AI research company and training our own models end-to-end. Our work spans areas such as model training,... ...The Role We are looking for a research engineer to build the evaluation infrastructure that tells us whether our models are getting better...
Mirendil
San Francisco, CA
2 days ago
Remote Internal/EM Physician for AI Model Tuning
A leading AI research accelerator based in San Francisco is seeking a medical expert in internal or... ...involves utilizing your medical expertise to evaluate and enhance AI-driven diagnostic capabilities, ensuring high-quality patient care and safety. Ideal candidates will...
Quality
Remote job
For contractors
Flexible hours
Turing
San Francisco, CA
2 days ago

Do you want to receive more vacancies?

Subscribe and receive similar vacancies to Model Evaluation & Data Quality Lead. Be the first to apply!