ML Evaluation Engineer: Benchmark & Model Quality

Reducto

A leading AI solutions company in San Francisco is seeking an ML Eval Engineer to design evaluation benchmarks and improve model performance. This role involves working with unstructured enterprise data and collaborating closely with the ML and engineering teams. You will develop metrics, conduct evaluations, and contribute to model enhancements in a fast-paced environment. If you enjoy solving complex problems and care about precision, this is the role for you. #J-18808-Ljbffr Reducto

Apply

Vacancy posted 1 day ago

Similar jobs that could be interesting for youBased on the ML Evaluation Engineer: Benchmark & Model Quality in San Francisco, CA vacancy

Machine Learning Engineer, Model Evaluations (Speech LLM) - San Francisco
$180k - $270k
...Possess strong software engineering skills (especially... ...pipelines, or evaluation harnesses that can... ...against live model checkpoints. Can... ...deeply partner with ML researchers to define... ...) into measurable benchmarks. Are comfortable... ...transcription accuracy, audio quality, and reasoning of...
Quality
Full time
Work at office
Worldwide
Plaud
San Francisco, CA
5 days ago
Software Engineer (Model Evaluation & Benchmarking)
Software Engineer (Model Evaluation & Benchmarking) About the Role We are hiring Engineers focused on AI Model Evaluation... ...measure realism, consistency, and quality across image, video, and multimodal... ...CI/CD workflows. Collaborate with ML researchers and infrastructure teams...
Quality
SpreeAI
San Francisco, CA
3 days ago
ML Evals Engineer — Build Benchmarking Pipelines
...Block, Inc. in San Francisco is looking for a Research Engineer to build evaluation systems for their product Firecrawl. The ideal... ...designing metrics, building pipelines, and defining benchmarks to ensure output quality. The position offers a hybrid work option, competitive...
Quality
AI Chopping Block, Inc.
San Francisco, CA
2 days ago
ML Engineer — LLM Evaluation
...world responsibility in mind. Our ML team comes from a culture of... ...advancement. Responsibilities Own LLM evaluation processes and methods with a focus on generating benchmarks representative of real-world... .... Generate high quality synthetic data, curate labels,...
Quality
Local area
Shift work
Capitolis
San Francisco, CA
1 day ago
Benchmarking Research Engineer: Frontier Model Evaluations
Refresh AI is seeking a Research Engineer in San Francisco to push the boundaries of benchmarking technology. You will build benchmarks that labs use for evaluating coding abilities and computer-use capability. Your role will require expertise in reinforcement learning...
Suggested
Full time
Refresh AI
San Francisco, CA
1 day ago
AI Model Evaluation Engineer — Benchmarking & Validation
A fast-growing AI company seeks a Software Engineer to focus on Model Evaluation & Benchmarking. This role involves building evaluation systems for multimodal AI, ensuring reliable performance. The ideal candidate will possess strong Python programming skills, familiarity...
SpreeAI
San Francisco, CA
3 days ago
AI Data Quality & Model Evaluation Associate
Welocalize is seeking a Data Quality Associate to evaluate AI model outputs and provide structured feedback. This is a full-time, onsite role located in San Francisco. The ideal candidate possesses a Bachelor's degree and has 1-2 years of professional writing experience...
Quality
Full time
Welocalize
San Francisco, CA
1 day ago
AI Model Evaluation Leader — Data Quality
...Labs in San Francisco is seeking a dedicated member for our ML Data Team to lead video data preparation and evaluation. This role includes defining dataset needs, automating processes, and enhancing data quality through collaboration. Ideal candidates should have over 5...
Quality
Flexible hours
Twelve-Labs
San Francisco, CA
5 days ago
ML Model Serving Engineer
...consisting of a variety of LLM, speech, and vision models. Partner with ML infrastructure and training engineers to build a fast, cost-effective, accurate, and... ...opportunities to produce faster models without sacrificing quality. Use techniques like in-flight batching,...
Quality
Full time
Contract work
Flexible hours
SESAME
San Francisco, CA
5 days ago
Senior Staff ML Engineer, Data & Evaluation (Remote)
...hiring a Senior Staff Machine Learning Engineer, focusing on driving evaluation strategies and data infrastructure... ...field, extensive experience in ML/AI systems, and strong leadership in... ...significantly impact ... operations to ensure quality and efficiency in AI applications....
Quality
Remote job
airbnb, Inc.
San Francisco, CA
1 day ago
AI Model Evaluator & Data Quality Analyst
Welocalize is seeking a Data Quality Associate based in San Francisco for a full-time position. This role involves evaluating AI outputs and providing detailed feedback, with applicants needing native-level language proficiency and a university degree. Successful candidates...
Quality
Full time
Welocalize
San Francisco, CA
1 day ago
Model Evaluation & Data Quality Lead
...multimodal foundation models that have the ability... ...a vital member of our ML Data Team - which leads... ...preparation and model evaluation. This role comes with... ...partnership, annotation, and quality evaluation work as... ...: Partner with Engineering and AI Model teams to...
Quality
Work at office
Worldwide
Flexible hours
Twelve Labs, Inc
San Francisco, CA
3 days ago
Remote AI Training Specialist: Model Tuning & Evaluation
$25 per hour
...is seeking AI Training Experts to assist in training and evaluating cutting-edge AI models. The role involves completing tasks such as analyzing and... ...and can work from home. Prolific creates a global pool for quality human data, connecting researchers with quality...
Quality
Remote job
Hourly pay
Work from home
Flexible hours
Prolific
San Francisco, CA
2 days ago
AI Data & Model Evaluation Lead
...experienced data operations professional for their ML Data Team. This role focuses on video-language data preparation, model evaluation, and requires strong skills in Python and... ...datasets, and a commitment to ensuring high-quality data. The position includes benefits like...
Quality
Flexible hours
Twelve-Labs
San Francisco, CA
1 day ago
AI Model Behavior Engineer—Quality & Evaluation
...technology company located in San Francisco is seeking an innovative Quality Engineer for their AI products. This role blends ops, strategy, and... ...labs, and ensure user satisfaction through effective evaluation baselines. Competitive salary and benefits offered, with a focus...
Quality
Notion
San Francisco, CA
5 days ago
ML Engineer
...Mach9 ML Engineer Role At Mach9, ML Engineers build the perception models at the core of our AI-enabled CAD system... ...Design, train, and evaluate computer vision and 3... ...just publishing or benchmarking. Working knowledge... ...Python and a production-quality ML library like...
Quality
Mach9
San Francisco, CA
5 days ago
Language Model Analyst - Fully Remote | Upto $20/hr Part-time
$20 per hour
...San Francisco, our investors include Benchmark , General Catalyst , Peter Thiel... ...and external tools . Generate high-quality human evaluation data by identifying response strengths... ...completeness of responses. Ensure model responses align with expected conversational...
Quality
Remote job
Contract work
Part time
Summer work
Mercor
San Francisco, CA
17 days ago
Foundation Model Data ML Engineer & Researcher
...California is seeking a highly skilled professional for foundation model development. The ideal candidate will focus on gathering and generating high-quality text data through advanced data engineering techniques. Candidates should have strong expertise in machine learning...
Quality
Liquid AI
San Francisco, CA
3 days ago
Senior ML Engineer, Foundation Models
...come from foundation models that generalize across... ...Model. As a Senior ML Engineer, Foundation Models, you... ...Your models won't just benchmark well; they'll serve... ...Food Foundation Model — evaluating tradeoffs across generalization... ...reliable, production-quality training and...
Quality
Flexible hours
Chef Robotics
San Francisco, CA
6 days ago
ML Engineer
...Machine Learning Engineer opportunities posted... ...end-to-end ML pipelines encompassing... ...practices such as model versioning, experiment... .... Ensure data quality, observability, and... ...Learning Enginer, Core Evaluations The... ...storage. Write tests, benchmarks, and diagnostics to...
Quality
Flexible hours
AI Chopping Block, Inc.
San Francisco, CA
1 day ago
Founding Applied ML Engineer
...Founding Applied ML Engineer Title of Role: Founding Applied ML... ...focuses on providing high-quality audio datasets and associated... ...Apply state-of-the-art models in automatic speech recognition... ...and delivery workflows. Evaluate and benchmark model performance,...
Quality
Work at office
Recruiting from Scratch
San Francisco, CA
5 days ago
ML Engineer
...Machine Learning Engineer Location: Onsite in... ...building foundation AI models for physics that... ...is hiring an ML Engineer to help ship... .../fine-tuning, benchmarking, and delivering results... ...training and evaluation. Run training and... ...and maintain high-quality, reproducible work...
Quality
Work at office
Flexible hours
1 day per week
UniversalAGI
San Francisco, CA
2 days ago
ML Engineer
$198k - $230k
...operate a flexible work model that combines both in‑... ...styles. Senior MLOps Engineer (Applied AI Focus) As... ...generation, model evaluation, and pre/post processing... ...criteria that allows us to benchmark models and make... ...improve ground truth quality. Applied MLOps Practitioner...
Quality
Work at office
Remote work
Work from home
Worldwide
Home office
Flexible hours
CreatorIQ
San Francisco, CA
4 days ago
Staff ML Engineer, Frontier AI
$250k - $350k
...started. The Role: As a Staff ML Engineer on the Frontier AI team at Ambience, you'll own the hardest model quality problems across our clinical AI products... ...-source contributions to ML libraries, benchmarks, or evaluation frameworks. Why Here: Our products...
Quality
Work at office
Immediate start
Remote work
Flexible hours
3 days per week
Ambience Healthcare
San Francisco, CA
4 days ago
Staff Engineer, Evals Platform & Model Benchmarking
$200k
...is seeking a Member of Technical Staff to build the internal evaluations platform that supports critical company decisions. You will design... .... The role is pivotal for research decisions and product quality, with a compensation range between $200K - $550K, including equity...
Quality
Magic
San Francisco, CA
4 days ago
Engineering Manager, Model Library
...bring cutting‑edge models into production.... ...build the platform engineers turn to ship AI products... ...discover, evaluate, and select the right... ...production‑quality execution. You’ll... ...Partner with product, ML, and cross‑functional... ...model evaluation, benchmarking, or comparison frameworks...
Quality
Flexible hours
Baseten
San Francisco, CA
1 day ago
Model Behavior Engineer
$98k - $140k
...The Role You’ll own the quality bar for Notion AI... ...work with product and engineering teams to build systems... ...engineering, designing evaluation systems, and analyzing... ...you'll shape Notion’s model strategy and work directly... ..., Google, and others. Benchmark across dimensions:...
Quality
Live in
Work at office
Local area
Notion
San Francisco, CA
3 days ago
ML-Infrastructure Engineer
$100k - $200k
Coval Simulation & Evaluation that scales voice and chat AI agents ML‑Infrastructure Engineer Salary $100K - $200K Equity... ...run touches multiple models (LLMs, speech‑to‑text... ...(cost, latency, quality) and make the calls on... ...all of them. You'll benchmark the latest models across...
Quality
Full time
Live in
Work at office
Voiceflow
San Francisco, CA
2 days ago
Senior Software Engineer - Model Performance
$220k - $320k
...specialized language models for companies that need frontier-quality AI at a fraction... ..., training, evaluation, and planet-scale... ...ten-person team of engineers who work in-person... ...Build tooling and benchmarks to measure and track... ...Collaborate with applied ML engineers to...
Quality
Work at office
Inference
San Francisco, CA
4 days ago
Senior Machine Learning Engineer
$225k - $325k
...-on, high-ownership role for ML engineers who want to build production models that actually ship, and perform... ...models and audio models, evaluate them with rigorous benchmarks (and human feedback), and deploy... ..., benchmark subjective quality, and inform model iterations....
Quality
H1b
Work at office
Retell AI
San Francisco, CA
2 days ago

Do you want to receive more vacancies?

Subscribe and receive similar vacancies to ML Evaluation Engineer: Benchmark & Model Quality. Be the first to apply!