Sign up to access all features of our service.
  • Job search
  • Favorites
  • Create a CV
    New
  • Salaries
  • Subscriptions

ML Evaluation Engineer: Benchmark & Model Quality

Reducto, Inc.

A cutting-edge AI company located in San Francisco is seeking an ML Eval Engineer to enhance model evaluations and ensure quality metrics. This role involves designing benchmarks, collaborating with teams to identify model weaknesses, and developing automated processes. Candidates should possess strong Python skills, a passion for solving complex problems, and a background in AI or data infrastructure. The position is in-person and offers a dynamic work environment focused on growth and impact. #J-18808-Ljbffr

Vacancy posted 4 hours ago
Similar jobs that could be interesting for youBased on the ML Evaluation Engineer: Benchmark & Model Quality in San Francisco, CA vacancy
  •  ...A leading AI solutions company in San Francisco is seeking an ML Eval Engineer to design evaluation benchmarks and improve model performance. This role involves working with unstructured enterprise data and collaborating closely with the ML and engineering teams. You... 
    Quality

    Reducto

    San Francisco, CA
    4 days ago
  • $180k - $270k

     ...Possess strong software engineering skills (especially...  ...pipelines, or evaluation harnesses that can...  ...against live model checkpoints. Can deeply...  ...partner with ML researchers to define...  ...) into measurable benchmarks. Are comfortable building...  ...accuracy, audio quality, and reasoning of... 
    Quality
    Full time
    Work at office
    Worldwide

    Plaud

    San Francisco, CA
    4 days ago
  •  ...Role We are hiring Engineers focused on AI Model Evaluation to build the systems that...  ...models through automated benchmarking, dataset-driven testing,...  ...realism, consistency, and quality across image, video, and...  ...workflows. Collaborate with ML researchers and... 
    Quality

    SPREEAI

    San Francisco, CA
    3 days ago
  • $208k - $300k

    Machine Learning Engineer - Model Evaluations, Public Sector San Francisco, CA; St....  ...Sector The Public Sector ML team at Scale deploys advanced...  ...Design test datasets and benchmarks to measure generalization,...  ...monitoring, regression testing, and quality assurance for ML systems.... 
    Quality
    Full time

    Scale AI, Inc.

    San Francisco, CA
    1 day ago
  •  ...Block, Inc. in San Francisco is looking for a Research Engineer to build evaluation systems for their product Firecrawl. The ideal...  ...designing metrics, building pipelines, and defining benchmarks to ensure output quality. The position offers a hybrid work option, competitive... 
    Quality

    AI Chopping Block, Inc.

    San Francisco, CA
    3 days ago
  •  ...world responsibility in mind. Our ML team comes from a culture of...  ...advancement. Responsibilities Own LLM evaluation processes and methods with a focus on generating benchmarks representative of real-world...  .... Generate high quality synthetic data, curate labels,... 
    Quality
    Local area
    Shift work

    Capitolis

    San Francisco, CA
    2 days ago
  • $208k - $300k

     ...leading AI company is seeking a Machine Learning Engineer in the Public Sector to develop automated evaluation pipelines for AI models. You will work on advanced AI systems and...  ...strong programming background and experience in ML evaluation frameworks. Competitive salary... 

    Scale AI

    San Francisco, CA
    4 days ago
  • Refresh AI is seeking a Research Engineer in San Francisco to push the boundaries of benchmarking technology. You will build benchmarks that labs use for evaluating coding abilities and computer-use capability. Your role will require expertise in reinforcement learning... 
    Full time

    Refresh AI

    San Francisco, CA
    2 days ago
  • $204k - $259k

     ...demonstration, generative modeling, Bayesian...  ...learning, and robust evaluation. This role...  ...Senior Staff Software Engineer. You will:...  ...evaluation systems and benchmarks for Waymo...  ...for evaluating the quality, safety, and realism...  ...Experience in ML engineering and applied... 
    Quality
    Full time
    Temporary work
    Remote work

    Waymo

    San Francisco, CA
    1 day ago
  • A fast-growing AI company seeks a Software Engineer to focus on Model Evaluation & Benchmarking. This role involves building evaluation systems for multimodal AI, ensuring reliable performance. The ideal candidate will possess strong Python programming skills, familiarity... 

    SpreeAI

    San Francisco, CA
    4 days ago
  •  ...Labs in San Francisco is seeking a dedicated member for our ML Data Team to lead video data preparation and evaluation. This role includes defining dataset needs, automating processes, and enhancing data quality through collaboration. Ideal candidates should have over 5... 
    Quality
    Flexible hours

    Twelve-Labs

    San Francisco, CA
    3 days ago
  • Welocalize is seeking a Data Quality Associate to evaluate AI model outputs and provide structured feedback. This is a full-time, onsite role located in San Francisco. The ideal candidate possesses a Bachelor's degree and has 1-2 years of professional writing experience... 
    Quality
    Full time

    Welocalize

    San Francisco, CA
    2 days ago
  • $15 - $20 per hour

     ...position. Responsibilities include fact-checking and generating high-quality human evaluation data for AI systems. Ideal candidates will have a Bachelor's degree, significant experience with large language models, and excellent writing skills in English. This role offers $15-... 
    Quality
    Remote job

    Mercor

    San Francisco, CA
    4 days ago
  • $15 - $20 per hour

     ...seeking a Generalist with proficiency in English and Kannada to conduct fact-checking and generate evaluation data. This role involves assessing model response quality and ensuring alignment with conversational guidelines. The ideal candidate will possess a Bachelor's... 
    Quality
    Remote job
    Hourly pay

    Mercor

    San Francisco, CA
    1 day ago
  • $300k - $320k

     ...Technical Program Manager to lead our AI model evaluation initiatives across multiple...  ...functional programs in AI development, ML engineering, or related fields. You’ll be joining...  ...strategic priorities with rapid, high-quality execution. Thrive in unstructured environments... 
    Quality
    Work at office
    Home office
    Visa sponsorship
    Relocation package

    Anthropic

    San Francisco, CA
    2 days ago
  •  ...consisting of a variety of LLM, speech, and vision models. Partner with ML infrastructure and training engineers to build a fast, cost-effective, accurate, and...  ...opportunities to produce faster models without sacrificing quality. Use techniques like in-flight batching,... 
    Quality
    Full time
    Contract work
    Flexible hours

    SESAME

    San Francisco, CA
    1 day ago
  • Welocalize is seeking a Data Quality Associate based in San Francisco for a full-time position. This role involves evaluating AI outputs and providing detailed feedback, with applicants needing native-level language proficiency and a university degree. Successful candidates... 
    Quality
    Full time

    Welocalize

    San Francisco, CA
    2 days ago
  •  ...hiring a Senior Staff Machine Learning Engineer, focusing on driving evaluation strategies and data infrastructure...  ...field, extensive experience in ML/AI systems, and strong leadership in...  ...significantly impact ... operations to ensure quality and efficiency in AI applications.... 
    Quality
    Remote job

    airbnb, Inc.

    San Francisco, CA
    2 days ago
  •  ...multimodal foundation models that have the ability...  ...a vital member of our ML Data Team - which leads...  ...preparation and model evaluation. This role comes with...  ...partnership, annotation, and quality evaluation work as...  ...: Partner with Engineering and AI Model teams to... 
    Quality
    Work at office
    Worldwide
    Flexible hours

    Twelve Labs, Inc

    San Francisco, CA
    4 days ago
  • $20 per hour

     ...San Francisco, our investors include Benchmark , General Catalyst , Peter Thiel...  ...external tools . Generate high-quality human evaluation data by identifying response strengths...  ...completeness of responses. Ensure model responses align with expected conversational... 
    Quality
    Remote job
    Contract work
    Part time
    Summer work

    Mercor

    San Francisco, CA
    17 days ago
  • $80 - $150 per hour

    Mercor is looking for a Biology PhD Expert to evaluate technical quality and scientific reasoning across various research domains. The role involves reviewing research outputs and collaborating with experts to improve scientific rigor. Applicants should hold a PhD in relevant... 
    Quality
    Remote job

    Mercor

    San Francisco, CA
    3 days ago
  • $25 per hour

     ...is seeking AI Training Experts to assist in training and evaluating cutting-edge AI models. The role involves completing tasks such as analyzing and...  ...and can work from home. Prolific creates a global pool for quality human data, connecting researchers with quality... 
    Quality
    Remote job
    Hourly pay
    Work from home
    Flexible hours

    Prolific

    San Francisco, CA
    3 days ago
  •  ...experienced data operations professional for their ML Data Team. This role focuses on video-language data preparation, model evaluation, and requires strong skills in Python and...  ...datasets, and a commitment to ensuring high-quality data. The position includes benefits like... 
    Quality
    Flexible hours

    Twelve-Labs

    San Francisco, CA
    2 days ago
  •  ...technology company located in San Francisco is seeking an innovative Quality Engineer for their AI products. This role blends ops, strategy, and...  ...labs, and ensure user satisfaction through effective evaluation baselines. Competitive salary and benefits offered, with a focus... 
    Quality

    Notion

    San Francisco, CA
    1 day ago
  •  ...California is seeking a highly skilled professional for foundation model development. The ideal candidate will focus on gathering and generating high-quality text data through advanced data engineering techniques. Candidates should have strong expertise in machine learning... 
    Quality

    Liquid AI

    San Francisco, CA
    3 days ago
  •  ...Mach9 ML Engineer Role At Mach9, ML Engineers build the perception models at the core of our AI-enabled CAD system...  ...Design, train, and evaluate computer vision and 3...  ...just publishing or benchmarking. Working knowledge...  ...Python and a production-quality ML library like... 
    Quality

    Mach9

    San Francisco, CA
    1 day ago
  • $200k

     ...Founding ML Engineer San Francisco, on-site, full-time...  ...access to a frontier model that isn't public yet...  ...proprietary data and benchmark against classical approaches...  ...Design rigorous evaluation frameworks for small datasets...  ...decisive, and keeping quality systems lightweight... 
    Quality
    Full time
    Night shift
    Day shift
    Afternoon shift

    Stealth Deep Tech

    San Francisco, CA
    3 days ago
  •  ...configurations. As a Senior ML Engineer, Manipulation, you...  ...) Implement and evaluate modern policy...  ...transformer-based action models, action chunking) and...  ...metrics and regression benchmarks that accurately predict...  ...reliable, production-quality training and evaluation... 
    Quality
    Flexible hours

    Chef Robotics

    San Francisco, CA
    2 days ago
  •  ...come from foundation models that generalize across...  ...Model. As a Senior ML Engineer, Foundation Models, you...  ...Your models won't just benchmark well; they'll serve...  ...Food Foundation Model — evaluating tradeoffs across generalization...  ...reliable, production-quality training and... 
    Quality
    Flexible hours

    Chef Robotics

    San Francisco, CA
    2 days ago
  •  ...bring cutting‑edge models into production....  ...build the platform engineers turn to ship AI products...  ...discover, evaluate, and select the right...  ...production‑quality execution. You’ll...  ...Partner with product, ML, and cross‑functional...  ...model evaluation, benchmarking, or comparison frameworks... 
    Quality
    Flexible hours

    Baseten

    San Francisco, CA
    3 days ago

Do you want to receive more vacancies?

Subscribe and receive similar vacancies to ML Evaluation Engineer: Benchmark & Model Quality. Be the first to apply!