Sign up to access all features of our service.
  • Job search
  • Favorites
  • Create a CV
    New
  • Salaries
  • Subscriptions

Staff Evaluations Engineer: Model Feedback & Metrics

Reflection

Reflection is seeking a talented individual to conduct critical analysis and build evaluation frameworks to improve model capabilities. The ideal candidate will possess strong statistical analysis skills and familiarity with LLM evaluation methodologies. We offer top-tier compensation, health and wellness benefits, full parental leave, relocation support, and opportunities to connect with teammates through daily meals and team celebrations. #J-18808-Ljbffr Reflection

Vacancy posted 5 days ago
Similar jobs that could be interesting for youBased on the Staff Evaluations Engineer: Model Feedback & Metrics in San Francisco, CA vacancy
  • $204k - $259k

     ...leading autonomous driving technology company in San Francisco is seeking an experienced engineer to develop evaluation techniques for machine learning models. The role involves metrics development, simulation strategies, and collaboration with top-tier engineering teams.... 
    Suggested

    Waymo

    San Francisco, CA
    4 days ago
  •  ...edge multimodal foundation models that have the ability to comprehend...  ...data preparation and model evaluation. This role comes with high...  ...in obtaining evaluation metrics and feedback. Portfolio Monitoring :...  ...: Partner with Engineering and AI Model teams to align... 
    Suggested
    Work at office
    Worldwide
    Flexible hours

    Twelve Labs, Inc

    San Francisco, CA
    5 days ago
  •  ...applying Machine Learning models, with a significant focus on...  ...large models (e.g., from human feedback/preferences) , (Desirable)...  ...designing and using metrics for evaluating complex AI systems , (Desirable...  ...for researchers and software engineers who are passionate about... 
    Suggested

    Waymo

    San Francisco, CA
    3 days ago
  •  ...of experience in data architecture and proficiency with cloud platforms. Responsibilities include evaluating AI responses, refining models, and providing structured feedback on data architecture topics. The ideal candidate should have a bachelor's degree in a relevant... 
    Suggested
    Remote work

    YO IT Consulting

    San Francisco, CA
    4 days ago
  • $208k - $300k

    Machine Learning Engineer - Model Evaluations, Public Sector San Francisco, CA; St. Louis, MO; New York, NY; Washington, DC Ready to Apply? Join...  ...models across functional, performance, robustness, and safety metrics, including LLM‑judge‑based evaluations. Design test... 
    Suggested
    Full time

    Scale AI, Inc.

    San Francisco, CA
    2 days ago
  • $180k - $270k

     ...clear, defensible, and automated metrics that researchers and leadership...  ...on. Possess strong software engineering skills (especially in Python) and...  ...systems, data pipelines, or evaluation harnesses that can run at scale against live model checkpoints. Can deeply partner... 
    Full time
    Work at office
    Worldwide

    Plaud

    San Francisco, CA
    2 days ago
  • A cutting-edge AI company located in San Francisco is seeking an ML Eval Engineer to enhance model evaluations and ensure quality metrics. This role involves designing benchmarks, collaborating with teams to identify model weaknesses, and developing automated processes... 

    Reducto, Inc.

    San Francisco, CA
    4 days ago
  •  ...solutions company in San Francisco is seeking an ML Eval Engineer to design evaluation benchmarks and improve model performance. This role involves working with...  ...the ML and engineering teams. You will develop metrics, conduct evaluations, and contribute to model enhancements... 

    Reducto

    San Francisco, CA
    3 days ago
  •  ...About the Role We are hiring Engineers focused on AI Model Evaluation to build the systems that ensure multimodal AI behaves reliably, consistently...  ...regressions across versions. Develop evaluation metrics for realism, consistency, and performance. Integrate... 

    SPREEAI

    San Francisco, CA
    4 days ago
  • $204k - $259k

     ...simulation across 15+ U.S. states. The Large Model Evaluation team is at the nexus of Waymo’s AI...  ...are looking for quantitatively‑minded engineers to research and propose new ways to...  ...Driver. You will: Develop novel metrics and sampling techniques to measure the... 
    Full time
    Remote work

    Waymo

    San Francisco, CA
    4 days ago
  • Welocalize is seeking a Data Quality Associate to evaluate AI model outputs and provide structured feedback. This is a full-time, onsite role located in San Francisco. The ideal candidate possesses a Bachelor's degree and has 1-2 years of professional writing experience... 
    Full time

    Welocalize

    San Francisco, CA
    3 days ago
  • Jaide Health is seeking an engineer for their Model Efficiency team in San Francisco. The role focuses on building reliable ML systems while enhancing core performance metrics across model execution. You'll work with advanced performance techniques such as GPU/CUDA optimizations... 
    Remote job

    Jaide Health

    San Francisco, CA
    3 days ago
  • Welocalize is seeking a Data Quality Associate based in San Francisco for a full-time position. This role involves evaluating AI outputs and providing detailed feedback, with applicants needing native-level language proficiency and a university degree. Successful candidates... 
    Full time

    Welocalize

    San Francisco, CA
    3 days ago
  •  ...Anthropic is seeking a Research Lead for the Training Insights team to shape the evaluation of model capabilities. This hands-on leadership role involves developing innovative evaluation methodologies and mentoring a team of researchers. You will play a crucial role in... 
    Remote work

    Anthropic

    San Francisco, CA
    6 days ago
  •  ...organization in San Francisco is seeking a Research Lead for the Training Insights team. This role involves developing evaluation strategies for model capabilities and leading a team of researchers. Responsibilities include innovating evaluation methodologies and shaping... 

    Anthropic

    San Francisco, CA
    4 days ago
  • $186.3k - $268.1k

     ...worldwide. We’re a team of engineers, clinicians, and...  ...Job Duties Select and evaluate suppliers by leading efforts...  ...and influence using metrics for supplier part cost...  ...development cycle to feedback manufacturability...  ...CAD experience with 3D modeling tools is a plus Additional... 
    Contract work
    Local area
    Worldwide
    Flexible hours
    Shift work

    Intuitive

    San Francisco, CA
    5 days ago
  • $50 - $75 per hour

    A leading tech company based in Australia is seeking an AI Model Evaluator on a contract basis. The role involves evaluating AI-generated responses, writing prompts, and providing justifications based on specific criteria. Ideal candidates will hold a Master's degree in... 
    Hourly pay
    Contract work

    Mercor

    San Francisco, CA
    5 days ago
  •  ...of AI to bring cutting‑edge models into production. We’re...  ...and help build the platform engineers turn to ship AI products. THE...  ...helping developers discover, evaluate, and select the right...  ...reliability: define success metrics, establish feedback loops, and ensure a consistent... 
    Flexible hours

    Baseten

    San Francisco, CA
    3 days ago
  • Twelve-Labs in San Francisco is seeking a dedicated member for our ML Data Team to lead video data preparation and evaluation. This role includes defining dataset needs, automating processes, and enhancing data quality through collaboration. Ideal candidates should have... 
    Flexible hours

    Twelve-Labs

    San Francisco, CA
    2 days ago
  • $180k - $270k

     ...transformative AI tools for productivity. The role involves collaborating with machine learning researchers and engineering teams to define metrics, improve model capabilities, and ensure effective performance tracking. Candidates should bring strong software engineering... 

    Plaud

    San Francisco, CA
    2 days ago
  • YO IT Consulting is seeking a Senior Propulsion Engineer to evaluate AI-generated content related to propulsion engineering. This remote role...  ...would be advantageous. Join a team challenging AI language models to improve their technical reasoning. #J-18808-Ljbffr YO IT... 
    Remote job

    YO IT Consulting

    San Francisco, CA
    5 days ago
  • $15 - $20 per hour

    Mercor is seeking a Generalist with proficiency in English and Kannada to conduct fact-checking and generate evaluation data. This role involves assessing model response quality and ensuring alignment with conversational guidelines. The ideal candidate will possess a Bachelor... 
    Remote job
    Hourly pay

    Mercor

    San Francisco, CA
    2 days ago
  • $15 - $20 per hour

     ...Responsibilities include fact-checking and generating high-quality human evaluation data for AI systems. Ideal candidates will have a Bachelor's degree, significant experience with large language models, and excellent writing skills in English. This role offers $15-$20/... 
    Remote job

    Mercor

    San Francisco, CA
    5 days ago
  • Introducing Moonlake, AI for creating world simulations. Modeling & architecture Build and iterate on 2D/3D/image/video/audio diffusion architectures Work on conditioning: text/image/pose/layout/control signals, multi-modal encoders, guidance strategies. Training &... 

    Embedding VC

    San Francisco, CA
    3 days ago
  • A leading AI research firm in San Francisco is seeking a Member of Technical Staff specialized in Model Efficiency. In this role, you will enhance LLM inference systems by tackling performance issues and collaborating with cross-functional teams. Ideal candidates have... 
    Remote work

    Cohere

    San Francisco, CA
    5 days ago
  • $192k - $260k

    A leading data and AI company is seeking a Staff Engineer to design and implement core systems for Foundation Model Serving. The ideal candidate will have over 10 years of experience in building large-scale distributed systems and will collaborate closely across teams... 

    Databricks

    San Francisco, CA
    5 days ago
  • $208k - $300k

     ...A leading AI company is seeking a Machine Learning Engineer in the Public Sector to develop automated evaluation pipelines for AI models. You will work on advanced AI systems and ensure they perform reliably in mission-critical environments. Ideal candidates have a strong... 

    Scale AI

    San Francisco, CA
    4 days ago
  • $180k

    A technology company focused on AI is seeking experienced software engineers to develop robust data pipelines and automation frameworks. This role involves creating and maintaining evaluation tasks and improving operational procedures for RL training. The ideal candidate... 

    xAI

    San Francisco, CA
    4 days ago
  • $320k

    Anthropic in New York City is seeking a Research Engineer to develop evaluations for Claude’s capabilities. The ideal candidate should have strong Python...  ...during training runs. The role offers a hybrid work model and competitive compensation ranging from $320,000 to $485... 
    Remote job

    Menlo Ventures

    San Francisco, CA
    5 days ago
  • Refresh AI is seeking a Research Engineer in San Francisco to push the boundaries of benchmarking technology. You will build benchmarks that labs use for evaluating coding abilities and computer-use capability. Your role will require expertise in reinforcement learning... 
    Full time

    Refresh AI

    San Francisco, CA
    3 days ago

Do you want to receive more vacancies?

Subscribe and receive similar vacancies to Staff Evaluations Engineer: Model Feedback & Metrics. Be the first to apply!