Staff Evaluations Engineer: Model Feedback & Metrics

Reflection

Reflection is seeking a talented individual to conduct critical analysis and build evaluation frameworks to improve model capabilities. The ideal candidate will possess strong statistical analysis skills and familiarity with LLM evaluation methodologies. We offer top-tier compensation, health and wellness benefits, full parental leave, relocation support, and opportunities to connect with teammates through daily meals and team celebrations. #J-18808-Ljbffr Reflection

Apply

Vacancy posted 5 days ago

Similar jobs that could be interesting for youBased on the Staff Evaluations Engineer: Model Feedback & Metrics in San Francisco, CA vacancy

Senior ML Metrics Engineer for Large-Model Evaluation
$204k - $259k
...leading autonomous driving technology company in San Francisco is seeking an experienced engineer to develop evaluation techniques for machine learning models. The role involves metrics development, simulation strategies, and collaboration with top-tier engineering teams....
Suggested
Waymo
San Francisco, CA
4 days ago
Model Evaluation & Data Quality Lead
...edge multimodal foundation models that have the ability to comprehend... ...data preparation and model evaluation. This role comes with high... ...in obtaining evaluation metrics and feedback. Portfolio Monitoring :... ...: Partner with Engineering and AI Model teams to align...
Suggested
Work at office
Worldwide
Flexible hours
Twelve Labs, Inc
San Francisco, CA
5 days ago
Senior Staff ML Engineer (Driver Understanding and Evaluation)
...applying Machine Learning models, with a significant focus on... ...large models (e.g., from human feedback/preferences) , (Desirable)... ...designing and using metrics for evaluating complex AI systems , (Desirable... ...for researchers and software engineers who are passionate about...
Suggested
Waymo
San Francisco, CA
3 days ago
Senior Data Architect for AI Model Evaluation (Remote)
...of experience in data architecture and proficiency with cloud platforms. Responsibilities include evaluating AI responses, refining models, and providing structured feedback on data architecture topics. The ideal candidate should have a bachelor's degree in a relevant...
Suggested
Remote work
YO IT Consulting
San Francisco, CA
4 days ago
Senior Machine Learning Engineer - Model Evaluations, Public Sector New York, NY Apply →
$208k - $300k
Machine Learning Engineer - Model Evaluations, Public Sector San Francisco, CA; St. Louis, MO; New York, NY; Washington, DC Ready to Apply? Join... ...models across functional, performance, robustness, and safety metrics, including LLM‑judge‑based evaluations. Design test...
Suggested
Full time
Scale AI, Inc.
San Francisco, CA
2 days ago
Machine Learning Engineer, Model Evaluations (Speech LLM) - San Francisco
$180k - $270k
...clear, defensible, and automated metrics that researchers and leadership... ...on. Possess strong software engineering skills (especially in Python) and... ...systems, data pipelines, or evaluation harnesses that can run at scale against live model checkpoints. Can deeply partner...
Full time
Work at office
Worldwide
Plaud
San Francisco, CA
2 days ago
ML Evaluation Engineer: Benchmark & Model Quality
A cutting-edge AI company located in San Francisco is seeking an ML Eval Engineer to enhance model evaluations and ensure quality metrics. This role involves designing benchmarks, collaborating with teams to identify model weaknesses, and developing automated processes...
Reducto, Inc.
San Francisco, CA
4 days ago
ML Evaluation Engineer: Benchmark & Model Quality
...solutions company in San Francisco is seeking an ML Eval Engineer to design evaluation benchmarks and improve model performance. This role involves working with... ...the ML and engineering teams. You will develop metrics, conduct evaluations, and contribute to model enhancements...
Reducto
San Francisco, CA
3 days ago
Software Engineer (Model Evaluation & Benchmarking)
...About the Role We are hiring Engineers focused on AI Model Evaluation to build the systems that ensure multimodal AI behaves reliably, consistently... ...regressions across versions. Develop evaluation metrics for realism, consistency, and performance. Integrate...
SPREEAI
San Francisco, CA
4 days ago
Senior Software Engineer (Large Model Evaluation)
$204k - $259k
...simulation across 15+ U.S. states. The Large Model Evaluation team is at the nexus of Waymo’s AI... ...are looking for quantitatively‑minded engineers to research and propose new ways to... ...Driver. You will: Develop novel metrics and sampling techniques to measure the...
Full time
Remote work
Waymo
San Francisco, CA
4 days ago
AI Data Quality & Model Evaluation Associate
Welocalize is seeking a Data Quality Associate to evaluate AI model outputs and provide structured feedback. This is a full-time, onsite role located in San Francisco. The ideal candidate possesses a Bachelor's degree and has 1-2 years of professional writing experience...
Full time
Welocalize
San Francisco, CA
3 days ago
Staff ML Inference Engineer — Model Efficiency (Remote)
Jaide Health is seeking an engineer for their Model Efficiency team in San Francisco. The role focuses on building reliable ML systems while enhancing core performance metrics across model execution. You'll work with advanced performance techniques such as GPU/CUDA optimizations...
Remote job
Jaide Health
San Francisco, CA
3 days ago
AI Model Evaluator & Data Quality Analyst
Welocalize is seeking a Data Quality Associate based in San Francisco for a full-time position. This role involves evaluating AI outputs and providing detailed feedback, with applicants needing native-level language proficiency and a university degree. Successful candidates...
Full time
Welocalize
San Francisco, CA
3 days ago
Research Lead, Model Evaluation & Training Insights
...Anthropic is seeking a Research Lead for the Training Insights team to shape the evaluation of model capabilities. This hands-on leadership role involves developing innovative evaluation methodologies and mentoring a team of researchers. You will play a crucial role in...
Remote work
Anthropic
San Francisco, CA
6 days ago
Lead, Training Insights & Model Evaluation
...organization in San Francisco is seeking a Research Lead for the Training Insights team. This role involves developing evaluation strategies for model capabilities and leading a team of researchers. Responsibilities include innovating evaluation methodologies and shaping...
Anthropic
San Francisco, CA
4 days ago
Staff Supplier Engineer - Camera and Cabling
$186.3k - $268.1k
...worldwide. We’re a team of engineers, clinicians, and... ...Job Duties Select and evaluate suppliers by leading efforts... ...and influence using metrics for supplier part cost... ...development cycle to feedback manufacturability... ...CAD experience with 3D modeling tools is a plus Additional...
Contract work
Local area
Worldwide
Flexible hours
Shift work
Intuitive
San Francisco, CA
5 days ago
Finance AI Model Evaluator - Contract, 20 hrs/week
$50 - $75 per hour
A leading tech company based in Australia is seeking an AI Model Evaluator on a contract basis. The role involves evaluating AI-generated responses, writing prompts, and providing justifications based on specific criteria. Ideal candidates will hold a Master's degree in...
Hourly pay
Contract work
Mercor
San Francisco, CA
5 days ago
Engineering Manager, Model Library
...of AI to bring cutting‑edge models into production. We’re... ...and help build the platform engineers turn to ship AI products. THE... ...helping developers discover, evaluate, and select the right... ...reliability: define success metrics, establish feedback loops, and ensure a consistent...
Flexible hours
Baseten
San Francisco, CA
3 days ago
AI Model Evaluation Leader — Data Quality
Twelve-Labs in San Francisco is seeking a dedicated member for our ML Data Team to lead video data preparation and evaluation. This role includes defining dataset needs, automating processes, and enhancing data quality through collaboration. Ideal candidates should have...
Flexible hours
Twelve-Labs
San Francisco, CA
2 days ago
Speech LLM Model Evaluations Engineer - Hybrid
$180k - $270k
...transformative AI tools for productivity. The role involves collaborating with machine learning researchers and engineering teams to define metrics, improve model capabilities, and ensure effective performance tracking. Candidates should bring strong software engineering...
Plaud
San Francisco, CA
2 days ago
Remote Propulsion Engineer for AI Model Evaluation
YO IT Consulting is seeking a Senior Propulsion Engineer to evaluate AI-generated content related to propulsion engineering. This remote role... ...would be advantageous. Join a team challenging AI language models to improve their technical reasoning. #J-18808-Ljbffr YO IT...
Remote job
YO IT Consulting
San Francisco, CA
5 days ago
Remote Kannada Evaluator for AI Model Quality
$15 - $20 per hour
Mercor is seeking a Generalist with proficiency in English and Kannada to conduct fact-checking and generate evaluation data. This role involves assessing model response quality and ensuring alignment with conversational guidelines. The ideal candidate will possess a Bachelor...
Remote job
Hourly pay
Mercor
San Francisco, CA
2 days ago
Remote Odia LLM Analyst — Model Evaluation
$15 - $20 per hour
...Responsibilities include fact-checking and generating high-quality human evaluation data for AI systems. Ideal candidates will have a Bachelor's degree, significant experience with large language models, and excellent writing skills in English. This role offers $15-$20/...
Remote job
Mercor
San Francisco, CA
5 days ago
Staff Diffusion Model Engineer — Multimodal & Inpainting
Introducing Moonlake, AI for creating world simulations. Modeling & architecture Build and iterate on 2D/3D/image/video/audio diffusion architectures Work on conditioning: text/image/pose/layout/control signals, multi-modal encoders, guidance strategies. Training &...
Embedding VC
San Francisco, CA
3 days ago
Staff Engineer - ML Inference & Model Efficiency
A leading AI research firm in San Francisco is seeking a Member of Technical Staff specialized in Model Efficiency. In this role, you will enhance LLM inference systems by tackling performance issues and collaborating with cross-functional teams. Ideal candidates have...
Remote work
Cohere
San Francisco, CA
5 days ago
Staff Engineer: Foundation Model API & GPU Inference
$192k - $260k
A leading data and AI company is seeking a Staff Engineer to design and implement core systems for Foundation Model Serving. The ideal candidate will have over 10 years of experience in building large-scale distributed systems and will collaborate closely across teams...
Databricks
San Francisco, CA
5 days ago
ML Engineer, Public Sector: Model Evaluations & Safety
$208k - $300k
...A leading AI company is seeking a Machine Learning Engineer in the Public Sector to develop automated evaluation pipelines for AI models. You will work on advanced AI systems and ensure they perform reliably in mission-critical environments. Ideal candidates have a strong...
Scale AI
San Francisco, CA
4 days ago
Staff Engineer, RL Infrastructure & AI Evaluation
$180k
A technology company focused on AI is seeking experienced software engineers to develop robust data pipelines and automation frameworks. This role involves creating and maintaining evaluation tasks and improving operational procedures for RL training. The ideal candidate...
xAI
San Francisco, CA
4 days ago
Research Engineer, Model Evaluations - Remote-Friendly Impact
$320k
Anthropic in New York City is seeking a Research Engineer to develop evaluations for Claude’s capabilities. The ideal candidate should have strong Python... ...during training runs. The role offers a hybrid work model and competitive compensation ranging from $320,000 to $485...
Remote job
Menlo Ventures
San Francisco, CA
5 days ago
Benchmarking Research Engineer: Frontier Model Evaluations
Refresh AI is seeking a Research Engineer in San Francisco to push the boundaries of benchmarking technology. You will build benchmarks that labs use for evaluating coding abilities and computer-use capability. Your role will require expertise in reinforcement learning...
Full time
Refresh AI
San Francisco, CA
3 days ago

Do you want to receive more vacancies?

Subscribe and receive similar vacancies to Staff Evaluations Engineer: Model Feedback & Metrics. Be the first to apply!