Sign up to access all features of our service.
  • Job search
  • Favorites
  • Create a CV
    New
  • Salaries
  • Subscriptions

Model Evaluation & Data Quality Lead

Twelve-Labs

Who We Are At Twelve Labs, we are pioneering the development of cutting‑edge multimodal foundation models that can comprehend videos just like humans do. Our models have redefined the standards in video‑language modeling, empowering us with more intuitive and far‑reaching capabilities, and fundamentally transforming the way we interact with and analyze various forms of media. With a remarkable $107 million in Seed and Series A funding, our company is backed by top‑tier venture capital firms such as NVIDIA’s NVentures, NEA, Radical Ventures, and Index Ventures, and prominent AI visionaries and founders such as Fei‑Fei Li, Silvio Savarese, Alexandr Wang and more. Headquartered in San Francisco, with an influential APAC presence in Seoul, our global footprint underscores our commitment to driving worldwide innovation. We are a global company that values the uniqueness of each person’s journey. We are looking for individuals who are motivated by our mission and eager to make an impact as we push the bounds of technology to transform the world. Join us as we revolutionize video understanding and multimodal AI. About the Role You will be a vital member of our ML Data Team – which leads the full spectrum of video‑language data preparation and model evaluation. This role comes with high ownership and includes responsibilities such as defining dataset needs and requirements in consultation with our research and product teams, designing and building data pipelines, and driving our post‑training model evaluation strategy. You will also be responsible for automating as much of the repetitive partnership, annotation, and quality evaluation work as possible. A desire to work cross functionally and to build relationships is critical for success in this position. You Will Model Evaluation: Design and build robust model evaluation frameworks, automating repetitive processes and maintaining a balanced approach to efficiency and depth to obtain evaluation metrics and feedback. Portfolio Monitoring: Manage resource allocation and timelines, adjusting direction flexibly based on real‑time information across all data streams in your product vertical. External Partner Collaboration: Enhance dataset and process quality through seamless collaboration with vendors and outsourcing partners. Data Quality & Tooling Advancement: Establish labeling guidelines, monitor data quality, and improve tools and infrastructure to build a sustainable data operations framework. Internal Collaboration: Partner with Engineering and AI Model teams to align on top priority data needs, design tools such as analytical reports and dashboards, and clearly communicate project progress. You May Be a Good Fit If You Have 5+ years of experience working in an AI focused data operations organization. A proven track record designing and executing large‑scale data or evaluation projects, including gathering, labeling, and post‑processing data. The ability to analyze messy and complex data, identify overarching patterns, and distill your findings into crisp annotation guidelines or model quality reports. Proficiency with Python, LLMs, or other popular industry tools for automation. Excellent communication and project management skills, and the ability to support several projects simultaneously. A foundational understanding of and interest in LLMs/VLMs and multimodal AI. Conviction that data is the key ingredient for the performance and assessment of AI models. You’ll Stand Out If You Have Experience in data collection and labeling for multimodal language models. Experience in red teaming, localization testing, or other evaluation focused fields. Experience working with research scientists and engineers. Expertise or interest in video‑centric domains, such as sports, advertising, and content creation. Tech Stack Development & Analysis: Python (primarily pandas, Jupyter, etc.) Data Management & Visualization: Amazon S3, various data visualization tools (framework‑agnostic) Project Management Tools: Linear, Notion Even if there are a few checkboxes that aren’t ticked through your prior experience, we still encourage you to apply! If you are a 0‑1 achiever, a ferocious learner, and a kind and fun team player who motivates others, you will find a home at TwelveLabs. We are a global company that values the uniqueness of each person’s journey. We are looking for individuals who are motivated by our mission and eager to make an impact as we push the bounds of technology to transform the world. Benefits and Perks Open and inclusive culture and work environment. Work closely with a collaborative, mission‑driven team on cutting‑edge AI technology. Full health, dental, and vision benefits. Flexible PTO and parental leave policy. Office closed the week of Christmas and New Years. #J-18808-Ljbffr Twelve-Labs

Vacancy posted 4 days ago
Similar jobs that could be interesting for youBased on the Model Evaluation & Data Quality Lead in San Francisco, CA vacancy
  •  ...San Francisco is seeking an experienced data operations professional for their ML Data...  ...on video-language data preparation, model evaluation, and requires strong skills in Python and...  ...datasets, and a commitment to ensuring high-quality data. The position includes benefits... 
    Quality
    Flexible hours

    Twelve-Labs

    San Francisco, CA
    5 days ago
  • TwelveLabs in San Francisco is seeking an experienced data operations professional to join their ML Data Team. The role involves designing robust model evaluation frameworks, managing project timelines, and enhancing data quality through collaboration with vendors. Ideal... 
    Quality
    Flexible hours

    TwelveLabs

    San Francisco, CA
    3 days ago
  • Welocalize is seeking a Data Quality Associate based in San Francisco for a full-time position. This role involves evaluating AI outputs and providing detailed feedback, with applicants needing native-level language proficiency and a university degree. Successful candidates... 
    Quality
    Full time

    Welocalize

    San Francisco, CA
    5 days ago
  • Welocalize is seeking a Data Quality Associate to evaluate AI model outputs and provide structured feedback. This is a full-time, onsite role located in San Francisco. The ideal candidate possesses a Bachelor's degree and has 1-2 years of professional writing experience... 
    Quality
    Full time

    Welocalize

    San Francisco, CA
    5 days ago
  • $15 - $20 per hour

     ...include fact-checking and generating high-quality human evaluation data for AI systems. Ideal candidates will...  ...experience with large language models, and excellent writing skills in English...  ...creative and technical talent with leading AI research labs. #J-18808-Ljbffr Mercor... 
    Quality
    Remote job

    Mercor Inc

    San Francisco, CA
    2 days ago
  •  ...Labs in San Francisco is seeking a dedicated member for our ML Data Team to lead video data preparation and evaluation. This role includes defining dataset needs, automating processes, and enhancing data quality through collaboration. Ideal candidates should have over 5... 
    Quality
    Flexible hours

    Twelve-Labs

    San Francisco, CA
    4 days ago
  • A leading AI solutions company in San Francisco is seeking an ML Eval Engineer to design evaluation benchmarks and improve model performance. This role involves working with unstructured enterprise data and collaborating closely with the ML and engineering teams. You will... 
    Quality

    Reducto

    San Francisco, CA
    5 days ago
  •  ...San Francisco is seeking an ML Eval Engineer to enhance model evaluations and ensure quality metrics. This role involves designing benchmarks, collaborating...  ...for solving complex problems, and a background in AI or data infrastructure. The position is in-person and offers a... 
    Quality

    Reducto, Inc.

    San Francisco, CA
    1 day ago
  • $180k - $270k

     ...committed to the highest standards of data security and privacy protection....  ...systems, data pipelines, or evaluation harnesses that can run at scale against live model checkpoints. Can deeply partner...  ..., transcription accuracy, audio quality, and reasoning of audio models.... 
    Quality
    Full time
    Work at office
    Worldwide

    Plaud

    San Francisco, CA
    4 days ago
  • $25 per hour

     ...seeking AI Training Experts to assist in training and evaluating cutting-edge AI models. The role involves completing tasks such as analyzing...  ...can work from home. Prolific creates a global pool for quality human data, connecting researchers with quality participants. #J-... 
    Quality
    Remote job
    Hourly pay
    Work from home
    Flexible hours

    Prolific

    San Francisco, CA
    1 day ago
  • Software Engineer (Model Evaluation & Benchmarking) About the Role We are hiring Engineers focused...  ...we measure realism, consistency, and quality across image, video, and multimodal AI systems...  ...(C++, Java, Python, or similar). Strong data structures and algorithms fundamentals.... 
    Quality

    SpreeAI

    San Francisco, CA
    2 days ago
  •  ...Francisco is seeking an innovative Quality Engineer for their AI products....  ...AI engineering team, you will use data to shape how AI behaves, work with partners in leading labs, and ensure user satisfaction through effective evaluation baselines. Competitive salary and... 
    Quality

    Notion

    San Francisco, CA
    4 days ago
  •  ...Team Leadership and Management: Lead, mentor, and grow a team of data annotators and quality analysts; ensure team members...  ...set goals, and conduct regular evaluations. Strategic Oversight: Collaborate...  ...of machine learning models, data annotation, and quality assurance... 
    Quality

    Welocalize

    San Francisco, CA
    5 days ago
  • Python Test Lead San Francisco, CA Contract Key Responsibilities:...  ..., and maintain scalable cloud data architectures and automated solutions...  .... Ensure automation of data quality, security, and governance...  ...monitoring of data infrastructure. Evaluate and recommend tools and... 
    Quality
    Contract work

    US Staffing Inc

    San Francisco, CA
    5 days ago
  •  ...We’re developing open weight models for individuals, agents, enterprises...  ...dataset provenance, training-data summaries, DPIAs, and the...  ...Reflection AI's training and evaluation data — so that every model we...  ...labeler provenance, and data quality so we can satisfy auditors,... 
    Quality
    Relocation package

    B Capital

    San Francisco, CA
    2 days ago
  •  ...About the Role As an Data Operations Lead , you will interface with AI and healthcare customers, scope requirements, and help construct high quality datasets, environments, and evaluations for AI models. Kinetic Systems works at the intersection of computer-use agents,... 
    Quality
    Work experience placement

    Gravity Engineering Services Pvt Ltd.

    San Francisco, CA
    2 days ago
  • $43.5 per hour

     ...Job Description Job Description Overview Welo Data is looking for a Data Quality Lead to oversee teams of Data Quality Analysts supporting AI model evaluation and improvement. This role is equal parts people leadership, quality ownership, and operational thinking... 
    Quality
    Full time
    Contract work
    Remote work

    Welocalize

    San Francisco, CA
    5 days ago
  •  ...built our own voice stack, models, and orchestration...  ...hand with Engineering, Data Ops tackles two critical...  ...required to build industry-leading AI audio models, and...  ...and meet rigorous quality standards. Lead Labeling...  ...data for the purpose of evaluating and selecting you as a... 
    Quality
    Worldwide
    Shift work

    Happy Robot

    San Francisco, CA
    3 days ago
  • $220k - $320k

     ...Inference.net trains and hosts specialized language models for companies that need frontier‑quality AI at a fraction of the cost. The models we train match...  ...everything end‑to‑end: distillation, training, evaluation, and planet‑scale hosting. We are a well‑funded ten‑... 
    Quality
    Work at office

    Inference

    San Francisco, CA
    1 day ago
  •  ...own the training pipeline behind the models that power both Parallel’s search stack...  ...and execute high‑value tasks over web data. You will build the path from real product usage to high‑quality training data, fine‑tune and evaluate these models rigorously, and ship them... 
    Quality
    Work at office
    Visa sponsorship

    Parallel Web Systems

    San Francisco, CA
    2 days ago
  •  ...humans do. We're pioneering the model architectures that will make...  .... We're funded by leading investors at Index Ventures and...  ...architectures that improve model quality, inference efficiency, and adaptability...  ...new frameworks and tools to evaluate architectural innovations,... 
    Quality
    Work at office
    Visa sponsorship
    Flexible hours

    Cartesia AI, Inc.

    San Francisco, CA
    4 days ago
  •  ...frontier of AI to bring cutting‑edge models into production. We’re growing...  ...AI products. THE ROLE You’ll lead the Model Library team at...  ...helping developers discover, evaluate, and select the right models for...  ...thinking through production‑quality execution. You’ll stay technically... 
    Quality
    Flexible hours

    Baseten

    San Francisco, CA
    5 days ago
  • $245k - $310k

    Product Manager, Model Behavior San Francisco, CA. About the Team The Model Behavior...  ...methodologies, tools, and processes for evaluating, tuning, and iterating on model behavior...  ...actionable metrics that accurately reflect model quality and user experience at scale. You might... 
    Quality
    Work at office
    Relocation package

    kozmetickesluzby.vecnakraska.sk - Jobboard

    San Francisco, CA
    2 days ago
  • $98k - $140k

     ...About The Role You’ll own the quality bar for Notion AI products...  ...engineering, designing evaluation systems, and analyzing data. This team sits in our AI...  ...that you'll shape Notion’s model strategy and work directly...  ...launch new models with leading research labs — Evaluate and... 
    Quality
    Live in
    Work at office
    Local area

    Notion

    San Francisco, CA
    2 days ago
  • $99k - $232k

     ...Opportunity As a Strategy & Business Model Reinvention Manager, you will...  .... As a Manager, you will lead teams and manage client...  ...and inspiring others to deliver quality. You are expected to lead with...  ...closely with team members. We evaluate these factors thoughtfully to... 
    Quality
    Full time
    H1b

    PwC

    San Francisco, CA
    4 days ago
  • $140k - $200k

     ...to manage and enhance human data pipelines in San Francisco. You will work directly with leading AI labs like Google and Meta...  ...create datasets crucial for model evaluation and training. This hybrid role...  ...pipelines, and ensuring high data quality. A competitive salary range... 
    Quality

    I did my part and supported the Regular Toilet

    San Francisco, CA
    5 days ago
  • $50 - $75 per hour

    A leading tech company based in Australia is seeking an AI Model Evaluator on a contract basis. The role involves evaluating AI-generated responses, writing prompts, and providing justifications based on specific criteria. Ideal candidates will hold a Master's degree in... 
    Hourly pay
    Contract work

    Mercor

    San Francisco, CA
    2 days ago
  • Anthropic is seeking a Research Lead for the Training Insights team to shape the evaluation of model capabilities. This hands-on leadership role involves developing innovative evaluation methodologies and mentoring a team of researchers. You will play a crucial role in... 
    Remote work

    Anthropic

    San Francisco, CA
    4 days ago
  • $350k

     ...are building a frontier AI research company and training our own models end-to-end. Our work spans areas such as model training,...  ...The Role We are looking for a research engineer to build the evaluation infrastructure that tells us whether our models are getting better... 

    Mirendil

    San Francisco, CA
    2 days ago
  • A leading AI research accelerator based in San Francisco is seeking a medical expert in internal or...  ...involves utilizing your medical expertise to evaluate and enhance AI-driven diagnostic capabilities, ensuring high-quality patient care and safety. Ideal candidates will... 
    Quality
    Remote job
    For contractors
    Flexible hours

    Turing

    San Francisco, CA
    2 days ago

Do you want to receive more vacancies?

Subscribe and receive similar vacancies to Model Evaluation & Data Quality Lead. Be the first to apply!