Staff Evaluations Engineer: Model Feedback & Metrics
Reflection
Reflection is seeking a talented individual to conduct critical analysis and build evaluation frameworks to improve model capabilities. The ideal candidate will possess strong statistical analysis skills and familiarity with LLM evaluation methodologies. We offer top-tier compensation, health and wellness benefits, full parental leave, relocation support, and opportunities to connect with teammates through daily meals and team celebrations. #J-18808-Ljbffr Reflection
$204k - $259k
...leading autonomous driving technology company in San Francisco is seeking an experienced engineer to develop evaluation techniques for machine learning models. The role involves metrics development, simulation strategies, and collaboration with top-tier engineering teams....Suggested- ...edge multimodal foundation models that have the ability to comprehend... ...data preparation and model evaluation. This role comes with high... ...in obtaining evaluation metrics and feedback. Portfolio Monitoring :... ...: Partner with Engineering and AI Model teams to align...SuggestedWork at officeWorldwideFlexible hours
- ...applying Machine Learning models, with a significant focus on... ...large models (e.g., from human feedback/preferences) , (Desirable)... ...designing and using metrics for evaluating complex AI systems , (Desirable... ...for researchers and software engineers who are passionate about...Suggested
- ...of experience in data architecture and proficiency with cloud platforms. Responsibilities include evaluating AI responses, refining models, and providing structured feedback on data architecture topics. The ideal candidate should have a bachelor's degree in a relevant...SuggestedRemote work
$208k - $300k
Machine Learning Engineer - Model Evaluations, Public Sector San Francisco, CA; St. Louis, MO; New York, NY; Washington, DC Ready to Apply? Join... ...models across functional, performance, robustness, and safety metrics, including LLM‑judge‑based evaluations. Design test...SuggestedFull time$180k - $270k
...clear, defensible, and automated metrics that researchers and leadership... ...on. Possess strong software engineering skills (especially in Python) and... ...systems, data pipelines, or evaluation harnesses that can run at scale against live model checkpoints. Can deeply partner...Full timeWork at officeWorldwide- A cutting-edge AI company located in San Francisco is seeking an ML Eval Engineer to enhance model evaluations and ensure quality metrics. This role involves designing benchmarks, collaborating with teams to identify model weaknesses, and developing automated processes...
- ...solutions company in San Francisco is seeking an ML Eval Engineer to design evaluation benchmarks and improve model performance. This role involves working with... ...the ML and engineering teams. You will develop metrics, conduct evaluations, and contribute to model enhancements...
- ...About the Role We are hiring Engineers focused on AI Model Evaluation to build the systems that ensure multimodal AI behaves reliably, consistently... ...regressions across versions. Develop evaluation metrics for realism, consistency, and performance. Integrate...
$204k - $259k
...simulation across 15+ U.S. states. The Large Model Evaluation team is at the nexus of Waymo’s AI... ...are looking for quantitatively‑minded engineers to research and propose new ways to... ...Driver. You will: Develop novel metrics and sampling techniques to measure the...Full timeRemote work- Welocalize is seeking a Data Quality Associate to evaluate AI model outputs and provide structured feedback. This is a full-time, onsite role located in San Francisco. The ideal candidate possesses a Bachelor's degree and has 1-2 years of professional writing experience...Full time
- Jaide Health is seeking an engineer for their Model Efficiency team in San Francisco. The role focuses on building reliable ML systems while enhancing core performance metrics across model execution. You'll work with advanced performance techniques such as GPU/CUDA optimizations...Remote job
- Welocalize is seeking a Data Quality Associate based in San Francisco for a full-time position. This role involves evaluating AI outputs and providing detailed feedback, with applicants needing native-level language proficiency and a university degree. Successful candidates...Full time
- ...Anthropic is seeking a Research Lead for the Training Insights team to shape the evaluation of model capabilities. This hands-on leadership role involves developing innovative evaluation methodologies and mentoring a team of researchers. You will play a crucial role in...Remote work
- ...organization in San Francisco is seeking a Research Lead for the Training Insights team. This role involves developing evaluation strategies for model capabilities and leading a team of researchers. Responsibilities include innovating evaluation methodologies and shaping...
$186.3k - $268.1k
...worldwide. We’re a team of engineers, clinicians, and... ...Job Duties Select and evaluate suppliers by leading efforts... ...and influence using metrics for supplier part cost... ...development cycle to feedback manufacturability... ...CAD experience with 3D modeling tools is a plus Additional...Contract workLocal areaWorldwideFlexible hoursShift work$50 - $75 per hour
A leading tech company based in Australia is seeking an AI Model Evaluator on a contract basis. The role involves evaluating AI-generated responses, writing prompts, and providing justifications based on specific criteria. Ideal candidates will hold a Master's degree in...Hourly payContract work- ...of AI to bring cutting‑edge models into production. We’re... ...and help build the platform engineers turn to ship AI products. THE... ...helping developers discover, evaluate, and select the right... ...reliability: define success metrics, establish feedback loops, and ensure a consistent...Flexible hours
- Twelve-Labs in San Francisco is seeking a dedicated member for our ML Data Team to lead video data preparation and evaluation. This role includes defining dataset needs, automating processes, and enhancing data quality through collaboration. Ideal candidates should have...Flexible hours
$180k - $270k
...transformative AI tools for productivity. The role involves collaborating with machine learning researchers and engineering teams to define metrics, improve model capabilities, and ensure effective performance tracking. Candidates should bring strong software engineering...- YO IT Consulting is seeking a Senior Propulsion Engineer to evaluate AI-generated content related to propulsion engineering. This remote role... ...would be advantageous. Join a team challenging AI language models to improve their technical reasoning. #J-18808-Ljbffr YO IT...Remote job
$15 - $20 per hour
Mercor is seeking a Generalist with proficiency in English and Kannada to conduct fact-checking and generate evaluation data. This role involves assessing model response quality and ensuring alignment with conversational guidelines. The ideal candidate will possess a Bachelor...Remote jobHourly pay$15 - $20 per hour
...Responsibilities include fact-checking and generating high-quality human evaluation data for AI systems. Ideal candidates will have a Bachelor's degree, significant experience with large language models, and excellent writing skills in English. This role offers $15-$20/...Remote job- Introducing Moonlake, AI for creating world simulations. Modeling & architecture Build and iterate on 2D/3D/image/video/audio diffusion architectures Work on conditioning: text/image/pose/layout/control signals, multi-modal encoders, guidance strategies. Training &...
- A leading AI research firm in San Francisco is seeking a Member of Technical Staff specialized in Model Efficiency. In this role, you will enhance LLM inference systems by tackling performance issues and collaborating with cross-functional teams. Ideal candidates have...Remote work
$192k - $260k
A leading data and AI company is seeking a Staff Engineer to design and implement core systems for Foundation Model Serving. The ideal candidate will have over 10 years of experience in building large-scale distributed systems and will collaborate closely across teams...$208k - $300k
...A leading AI company is seeking a Machine Learning Engineer in the Public Sector to develop automated evaluation pipelines for AI models. You will work on advanced AI systems and ensure they perform reliably in mission-critical environments. Ideal candidates have a strong...$180k
A technology company focused on AI is seeking experienced software engineers to develop robust data pipelines and automation frameworks. This role involves creating and maintaining evaluation tasks and improving operational procedures for RL training. The ideal candidate...$320k
Anthropic in New York City is seeking a Research Engineer to develop evaluations for Claude’s capabilities. The ideal candidate should have strong Python... ...during training runs. The role offers a hybrid work model and competitive compensation ranging from $320,000 to $485...Remote job- Refresh AI is seeking a Research Engineer in San Francisco to push the boundaries of benchmarking technology. You will build benchmarks that labs use for evaluating coding abilities and computer-use capability. Your role will require expertise in reinforcement learning...Full time
Do you want to receive more vacancies?
Subscribe and receive similar vacancies to Staff Evaluations Engineer: Model Feedback & Metrics. Be the first to apply!
- software engineer staff San Francisco, CA
- staff devops engineer San Francisco, CA
- assistant engineer San Francisco, CA
- assistant engineering manager San Francisco, CA
- staff design engineer San Francisco, CA
- project engineer assistant project manager San Francisco, CA
- technology administrator San Francisco, CA
- staff data engineer San Francisco, CA
- assistant chief engineer San Francisco, CA
- senior staff systems engineer San Francisco, CA

