AI Data Quality Auditor Onsite, Model Evaluation

$34 per hour

Welocalize

Overview Welo Data is looking for sharp, curious, and detail-oriented individuals to join our team as Data Quality Associate. This is not a traditional annotation role. You’ll be working directly with cutting-edge AI systems — evaluating outputs, identifying gaps, and helping improve how these systems behave in real-world scenarios. The work sits at the intersection of data quality, model evaluation, and human judgment , where your ability to think critically matters just as much as following guidelines. We’re looking for people who are naturally curious about AI, comfortable forming opinions, and confident in contributing to conversations with teammates, leads, and stakeholders. Project Details Job Title: Data Labeling Associate Hiring in: NYC, Seattle, Bellevue, Redmond, San Francisco, Sunnyvale, Burlingame. Hours: Full-time, 40 hours per week Employment Type: W2 Full-Time Employee Work Authorization: Must be authorized to work in the U.S. (no visa sponsorship) Pay Rate: $34/hour Contract Duration: 1-year contract with possibility of extension Important : This is a 100% onsite position — remote work is not available for this role. To be considered, candidates must be located in or able to commute to one of the following cities: New York City, Seattle, Bellevue, Redmond, San Francisco, Sunnyvale, or Burlingame. Please only apply if you meet this location requirement. What You'll Do Evaluate AI model outputs and provide structured, high-quality feedback Perform audit-based reviews of data and model behavior — identifying patterns, edge cases, and failure modes Apply guidelines thoughtfully — and flag when they don’t reflect real-world scenarios Contribute to improving evaluation frameworks, not just executing them Identify trends in model performance and communicate insights clearly Participate in team discussions, calibrations, and stakeholder syncs Partner with leads and cross-functional teams to refine quality standards Document findings in a clear, concise, and actionable way What We're Looking For Native-level language proficiency and a university degree (Bachelor’s or higher). B2 or superior level of English. 1–2 years of professional writing experience with strong, structured writing skills Ability to apply complex writing rules and guidelines consistently Strong understanding of safety considerations in GenAI data delivery, with 2+ years of relevant experience Strong critical thinking and attention to detail Ability to make sound judgment calls in ambiguous situations Naturally curious about AI, technology, and how systems behave Comfortable speaking up, asking questions, and contributing ideas Strong written and verbal communication skills Ability to stay consistent while working with evolving guidelines Experience in data quality, QA, annotation, or analysis is helpful — but not required Benefits Paid Vacation: 6 days Paid Company Holidays: 2 days (Memorial Day and Labor Day) Paid Sick Leave: accrued per applicable state law and company policy Medical, Dental, and Vision Insurance (eligibility applies) Health Savings Account (HSA) 401(k) Retirement Plan Employee Assistance Program Additional voluntary benefits (life, accident, critical illness, etc.) Free Gourmet Food: Free breakfast, lunch, and dinner are provided, featuring a wide variety of cuisines in multiple cafes. Micro-kitchens & Snacks: Offices are stocked with free snacks and beverages, including premium coffee and La Croix. Unique Campus Features: Some locations include roof-top nature parks Commuter Benefits: Free transport, shuttles, and sometimes bike-to-work perks. #J-18808-Ljbffr

Apply

Vacancy posted 3 days ago

Similar jobs that could be interesting for youBased on the AI Data Quality Auditor Onsite, Model Evaluation in San Francisco, CA vacancy

AI Data Quality & Model Evaluation Associate
Welocalize is seeking a Data Quality Associate to evaluate AI model outputs and provide structured feedback. This is a full-time, onsite role located in San Francisco. The ideal candidate possesses a Bachelor's degree and has 1-2 years of professional writing experience...
Data
Full time
Welocalize
San Francisco, CA
2 days ago
Onsite AI Evaluation & Data Quality Associate
...Welo Data is seeking a Data Labeling Associate to evaluate AI model outputs and improve data quality. The role requires native-level Canadian English proficiency and a relevant degree, offering a full-time, onsite position in cities including San Francisco and NYC. Responsibilities...
Data
Full time
Welo Data
San Francisco, CA
4 days ago
AI Model Evaluation Leader Data Quality
...seeking a dedicated member for our ML Data Team to lead video data preparation and evaluation. This role includes defining... ...automating processes, and enhancing data quality through collaboration. Ideal... ...over 5 years of experience in AI data operations, proficiency in Python...
Data
Flexible hours
Twelve-Labs
San Francisco, CA
3 days ago
ML Evaluation Engineer: Benchmark & Model Quality
...A cutting-edge AI company located in San Francisco is seeking an ML Eval Engineer to enhance model evaluations and ensure quality metrics. This role involves designing benchmarks, collaborating... ...problems, and a background in AI or data infrastructure. The position is in-...
Data
Reducto, Inc.
San Francisco, CA
16 hours ago
AI Model Evaluator & Data Quality Analyst
...Welocalize is seeking a Data Quality Associate based in San Francisco for a full-time position. This role involves evaluating AI outputs and providing detailed feedback, with applicants needing native-level language proficiency and a university degree. Successful candidates...
Data
Full time
Welocalize
San Francisco, CA
4 days ago
ML Evaluation Engineer: Benchmark & Model Quality
A leading AI solutions company in San Francisco is seeking an ML Eval Engineer to design evaluation benchmarks and improve model performance. This role involves working with unstructured enterprise data and collaborating closely with the ML and engineering teams. You will...
Data
Reducto
San Francisco, CA
2 days ago
Machine Learning Engineer, Model Evaluations (Speech LLM) - San Francisco
$180k - $270k
...the world’s most trusted AI work companion for... ...the highest standards of data security and privacy protection... ..., data pipelines, or evaluation harnesses that can run at scale against live model checkpoints. Can deeply... ...transcription accuracy, audio quality, and reasoning of audio...
Data
Full time
Work at office
Worldwide
Plaud
San Francisco, CA
1 day ago
Model Evaluation & Data Quality Lead
...multimodal foundation models that have the ability to... ...Ventures, and prominent AI visionaries and... ...vital member of our ML Data Team - which leads the... ...preparation and model evaluation. This role comes with high... ...partnership, annotation, and quality evaluation work as possible...
Data
Work at office
Worldwide
Flexible hours
Twelve Labs, Inc
San Francisco, CA
4 days ago
Remote AI Training Specialist: Model Tuning & Evaluation
$25 per hour
Prolific is seeking AI Training Experts to assist in training and evaluating cutting-edge AI models. The role involves completing tasks such as analyzing and writing... ...home. Prolific creates a global pool for quality human data, connecting researchers with quality participants...
Data
Remote job
Hourly pay
Work from home
Flexible hours
Prolific
San Francisco, CA
3 days ago
AI Data & Model Evaluation Lead
...Francisco is seeking an experienced data operations professional for... ...-language data preparation, model evaluation, and requires strong skills... ...should have over 5 years in AI data operations, the ability... ...commitment to ensuring high-quality data. The position includes benefits...
Data
Flexible hours
Twelve-Labs
San Francisco, CA
2 days ago
Software Engineer (Model Evaluation & Benchmarking)
Software Engineer (Model Evaluation & Benchmarking) About the Role We are hiring Engineers focused on AI Model Evaluation to build the systems... ...measure realism, consistency, and quality across image, video, and... ...Python, or similar). Strong data structures and algorithms...
Data
SpreeAI
San Francisco, CA
4 days ago
AI Model Behavior Engineer—Quality & Evaluation
...San Francisco is seeking an innovative Quality Engineer for their AI products. This role blends ops,... ...the AI engineering team, you will use data to shape how AI behaves, work with partners... ...user satisfaction through effective evaluation baselines. Competitive salary and...
Data
Notion
San Francisco, CA
1 day ago
Aerospace Engineer - AI Model Training - Remote
...the next generation of AI systems reason about flight... ...to strengthen model reasoning and technical... ...simulation outputs, test data, or performance models.... ...training, annotation, or evaluating AI‑generated technical... ...assumptions, and reasoning quality. Challenge advanced...
Data
For contractors
Remote work
YO IT Consulting
San Francisco, CA
3 days ago
Remote Odia LLM Analyst — Model Evaluation
$15 - $20 per hour
...position. Responsibilities include fact-checking and generating high-quality human evaluation data for AI systems. Ideal candidates will have a Bachelor's degree, significant experience with large language models, and excellent writing skills in English. This role offers $15-...
Data
Remote job
Mercor
San Francisco, CA
4 days ago
Applied AI Scientist, Small Language Model and AI Training
...Scientist specializing in Small Language Models and AI Training, you will lead research and... ...language models. Design, implement, and evaluate model training experiments to improve performance... ...Experience with distributed training , data pipeline design , and scalable AI...
Data
Gravity Engineering Services Pvt Ltd.
San Francisco, CA
4 days ago
Member of Technical Staff (Model Behavior Architect)
$180k - $260k
Perplexity is looking for a Model Behavior Architect to help shape... ...through well-designed research and evaluation projects. These projects may... ...Demonstrated passion for AI and can share specific, related... ...philosophy, psychology, linguistics, data science, or related fields....
Data
Perplexity
San Francisco, CA
2 days ago
Finance AI Model Evaluator - Contract, 20 hrs/week
$50 - $75 per hour
A leading tech company based in Australia is seeking an AI Model Evaluator on a contract basis. The role involves evaluating AI-generated responses, writing prompts, and providing justifications based on specific criteria. Ideal candidates will hold a Master's degree in...
Hourly pay
Contract work
Mercor
San Francisco, CA
4 days ago
AI Model Evaluation Program Lead
$300k - $320k
...role: We are seeking a Technical Program Manager to lead our AI model evaluation initiatives across multiple workstreams. This role will be... ...able to balance setting strategic priorities with rapid, high-quality execution. Thrive in unstructured environments, and have a...
Work at office
Home office
Visa sponsorship
Relocation package
Anthropic
San Francisco, CA
2 days ago
Model Engineer - Member of Technical Staff
Build the AI infrastructure layer of the physical world At Meter... ...team to build and train models that understand these systems,... ...latency really matter. Unmatched data advantage, control over the full... ...all decisions on a network. Evaluate model performance over real‑...
Data
Meter
San Francisco, CA
4 days ago
Staff Software Engineer, AI Model Lifecycle
$208.73k - $279.57k
...Staff Software Engineer For The Ai Model Lifecycle Team Crusoe is on a mission to accelerate... ...experts across energy, manufacturing, data center construction, and cloud services.... ...management: versioning, lineage, evaluation, and reproducible fine-tuning at scale....
Data
Temporary work
G2 Venture Partners
San Francisco, CA
3 days ago
Staff Software Engineer, Model LifeCycle
$300 per month
...people can create ambitiously with AI — without sacrificing scale,... ...Software Engineer for the Model LifeCycle team will play a key... ...management: versioning, lineage, evaluation, and reproducible fine-tuning... ...and alignment with market data. Crusoe is an Equal Opportunity...
Data
Temporary work
Dormont Manufacturing Company
San Francisco, CA
3 days ago
Technical Program Manager (Model Alignment and Deployment)
...execution in a fast-moving AI, ML, or research environment... ..., AI safety frameworks, LLM evaluation) as well as model deployment/serving, sufficient... ...record of shipping with quality and speed Strong analytical... ...mindset; comfortable working with data and user insights to measure...
Data
Character.ai
San Francisco, CA
4 days ago
Model Performance Software Engineer, Claude Code
$320k
...interpretable, and steerable AI systems. We want AI to be safe... ...tooling, infrastructure, and evaluations. You’ll build systems that help... ...evaluation systems that measure model capabilities across diverse... ...at scale Develop pipelines for data collection, processing, and analysis...
Data
Work experience placement
Work at office
Visa sponsorship
Flexible hours
Anthropic
San Francisco, CA
5 days ago
Benchmarking Research Engineer: Frontier Model Evaluations
Refresh AI is seeking a Research Engineer in San Francisco to push the boundaries of benchmarking technology. You will build benchmarks that labs use for evaluating coding abilities and computer-use capability. Your role will require expertise in reinforcement learning...
Full time
Refresh AI
San Francisco, CA
2 days ago
Technical Program Manager - Adversarial Model Research
...About the Team The Human Data team at OpenAI is responsible for identifying and mitigating risks in advanced AI systems by designing evaluations, surfacing vulnerabilities, and collaborating... ...closely with researchers to strengthen model reliability and public trust. About...
Data
Work at office
Relocation package
OpenAI
San Francisco, CA
4 days ago
Senior AI Infrastructure Engineer, Model Serving Platform
$216k - $270k
...engineers to integrate and optimize models for production and research... ...ensure a fair and thorough evaluation of all applicants. About... ...is to develop reliable AI systems for the world's most... ...Our products provide the high-quality data and full-stack technologies that...
Data
Scale AI, Inc.
San Francisco, CA
1 day ago
Backend Integration Engineer (AI/Model Services)
$74.38 - $83.8 per hour
...organization focused on modernizing its data science and AI platforms. Based out of Charlotte, NC... ...business applications. Rather than building models, you'll be responsible for creating the... ...Francisco Bay Area (Hybrid – 3 days onsite) Required Skills & Experience...
Data
Full time
Contract work
Temporary work
Flexible hours
Motion Recruitment
San Francisco, CA
16 hours ago
AI Engineer - Model Performance
Role Overview We’re hiring a Model Performance Engineer to own... ...that makes the rest of the AI team faster. This is not a... ...x speedup with less than 1% quality degradation. Evaluate serving frameworks (vLLM vs... ...frameworks, understanding of data formatting, learning rate schedules...
Data
Fathom
San Francisco, CA
1 day ago
AI Model Evaluation Engineer — Benchmarking & Validation
A fast-growing AI company seeks a Software Engineer to focus on Model Evaluation & Benchmarking. This role involves building evaluation systems for multimodal AI, ensuring reliable performance. The ideal candidate will possess strong Python programming skills, familiarity...
SpreeAI
San Francisco, CA
4 days ago
Protein Foundation Model Scientist — Onsite (Boston or SF)
A biotechnology firm is seeking an AI/ML Scientist to enhance their models for protein therapeutic design in San Francisco or Boston. The role involves advancing model training with proprietary experimental data and requires experience in deep learning frameworks like...
Data
Manifold Bio
San Francisco, CA
5 days ago

Do you want to receive more vacancies?

Subscribe and receive similar vacancies to AI Data Quality Auditor Onsite, Model Evaluation. Be the first to apply!