Sign up to access all features of our service.
  • Job search
  • Favorites
  • Create a CV
    New
  • Salaries
  • Subscriptions

Staff ML Research Scientist, LLM Evaluations & Benchmarks

Scale AI, Inc.

A leading AI evaluation company is looking for a Staff Machine Learning Research Scientist to advance LLM evaluation methodologies. This role involves designing benchmarks, collaborating with teams, and mentoring others. Ideal candidates have significant experience in NLP and have published research in major conferences. Competitive compensation includes salary, equity, and comprehensive benefits. Preferred location is San Francisco. #J-18808-Ljbffr Scale AI, Inc.

Vacancy posted 3 days ago
Similar jobs that could be interesting for youBased on the Staff ML Research Scientist, LLM Evaluations & Benchmarks in San Francisco, CA vacancy
  • $264.8k - $331k

     ...leading data and evaluation partner for...  ...evaluation and benchmarking of large language...  ...industry-leading LLM evals, setting...  .... Our Research teams work with...  ...research. As a Staff Machine Learning Research Scientist on the LLM Evals...  ...pipelines using modern ML frameworks.... 
    Suggested
    Full time

    DiversityJobs Inc

    San Francisco, CA
    13 days ago
  • A leading AI evaluation firm based in San Francisco seeks a Machine Learning Scientist to foster understanding of AI model performance. You'll engage in designing and analyzing comprehensive experiments while collaborating across teams. Applicants should possess a PhD in... 
    Suggested

    Arena Intelligence, Inc.

    San Francisco, CA
    4 days ago
  • $197.4k - $246.75k

    Scale Labs, Research Scientist — Frontier Risk Evaluations As the leading data and evaluation partner...  ...and instrumenting ML pipelines, writing evaluation...  ...crafting evaluations and benchmarks, or a background in data science roles related to LLM technologies. Experience... 
    Suggested
    Full time

    Scale AI, Inc.

    San Francisco, CA
    4 days ago
  • $302.4k - $378k

     ...building upon our prior model evaluation work with enterprise...  ...team, part of Scale's Research organization, brings...  ...and RL reward signals, benchmarking autonomous agent performance...  ...not only expertise in LLM agents and planning...  ...published research in top ML venues (e.g., ACL,... 
    Suggested
    Full time

    Scale AI

    San Francisco, CA
    6 days ago
  • $250k - $350k

     ...of what's possible in LLM post-training. If you love...  ..., and turning research insights into products...  ...distillation, training, evaluation, and planet-scale hosting...  ...settings Create robust benchmarks and evaluation frameworks...  ...performance Stay current with ML research and identify... 
    Suggested
    Work at office

    Inference

    San Francisco, CA
    1 day ago
  • $252k - $315k

     ...Machine Learning Research Scientist, Reasoning San Francisco, CA; Seattle...  ...Building on our history of model evaluation with enterprise and...  ...types critical for advancing LLM-based agents, including browser...  ...of published research in top ML and NLP venues (e.g., ACL, EMNLP... 
    Full time

    Scale AI

    San Francisco, CA
    5 days ago
  •  ...Staff / Principal ML Training Systems Engineer We are building...  ...heavily in research, infrastructure, and...  ...Develop automated benchmarking and regression detection...  ...directly with research scientists and ML engineers in...  ...variable-length data Evaluation cadence and rollout... 

    Seer

    San Francisco, CA
    5 hours ago
  • $250k - $325k

     ...RAG [2023] Large-scale LLM‑based legal fact...  ...What, and Who Why: AI Researchers are the engine of innovation...  ...legal‑specific tasks. Evaluate emerging work in agentic...  ...and maintain datasets, benchmarks, and evals for training...  ...Publications at top venues—e.g., ML/AI conferences (NeurIPS... 
    Contract work
    Immediate start

    Ivo

    San Francisco, CA
    1 day ago
  • Fleet AI, Inc. is seeking a Research Scientist to join their core research team in San Francisco. This role focuses on investigating...  ...with leading labs. Key responsibilities include generating benchmarks to evaluate frontier models, automating environment construction for... 

    Fleet AI, Inc.

    San Francisco, CA
    1 day ago
  •  ...agents, and we are hiring a Research Scientist to advance the neuro-symbolic...  .... Publish at top AI and ML conferences (NeurIPS, ICML,...  ...with full observability and benchmarking. Engage with the Bay Area academic...  ..., etc.). Familiarity with LLM reasoning, retrieval-augmented... 
    Work at office
    Relocation

    Rippletide SAS

    San Francisco, CA
    1 day ago
  • $197.4k - $246.75k

    Scale Labs, Research Scientist — AI Controls and Monitoring As the leading data and evaluation partner for frontier AI companies, Scale plays...  ...to establish standards and benchmarks for AI monitoring and...  ...experience addressing sophisticated ML problems, whether in a... 

    Scale AI, Inc.

    San Francisco, CA
    3 days ago
  • $315k - $340k

    [Expression of Interest] Research Scientist/Engineer, Honesty About Anthropic...  ...comprehensive honesty benchmarks and evaluation frameworks Implement techniques...  .../PhD in Computer Science, ML, or related field Possess...  ...: Currently, we expect all staff to be in one of our offices... 
    Full time
    Work at office
    Visa sponsorship
    Flexible hours

    Menlo Ventures

    San Francisco, CA
    3 days ago
  • $350k

     ...growing group of committed researchers, engineers, policy...  ...full stack of audio ML, developing audio codecs...  ..., from performance benchmarking to kernel optimization...  ...audio Creating robust evaluation methodologies for hard...  ...Currently, we expect all staff to be in one of our... 
    Full time
    Work at office
    Visa sponsorship
    Flexible hours
    Shift work

    Menlo Ventures

    San Francisco, CA
    5 days ago
  • $280k

     ...group of committed researchers, engineers, policy...  ...yourself as both a scientist and an engineer. As...  ...tooling to efficiently evaluate the effectiveness of novel LLM-generated...  ...significant software, ML, or research engineering...  ..., we expect all staff to be in one of our... 
    Contract work
    For contractors
    For subcontractor
    Work at office
    Relocation
    Visa sponsorship
    Work visa
    Flexible hours

    Menlo Ventures

    San Francisco, CA
    1 day ago
  • $357k

     ...Responsibilities Workato's AI Research Lab is seeking an...  ...Lead AI Research Scientist to join our growing team...  ...workflow graphs, and agent evaluation frameworks, while...  ...PyTorch or JAX and modern LLM frameworks. ~ Proven...  ...publication record in top-tier ML venues (NeurIPS, ICML,... 
    Work at office
    Remote work
    Flexible hours

    Workato

    San Francisco, CA
    13 days ago
  •  ...Cohere is a team of researchers, engineers, designers,...  ...not a typical "Applied Scientist" or "ML Engineer" role. As a Member of Technical Staff, Applied ML, you will:...  ...domains, design custom LLM solutions, and deliver...  ...agent integrations, model evaluations, and SOTA modeling... 
    Full time
    Work at office
    Remote work
    Flexible hours

    Cohere

    San Francisco, CA
    1 day ago
  •  ...Principal AI Security & Risk Researcher to join our founding research...  ...methodologies Stay current with LLM vulnerabilities, adversarial techniques...  ...agentic systems Develop risk evaluation methodologies that adapt as...  ..., with 2+ years focused on AI/ML security, red teaming, or... 
    Part time
    Remote work
    Flexible hours

    Ciph Lab

    San Francisco, CA
    4 days ago
  • $240k - $300k

     ...Data teams to create thought‑leading research, craft benchmark reports, and publish insights that make...  ...subject‑matter experts, data scientists, and customers to surface exclusive insights...  ...Experience analyzing markets and evaluating data from multiple sources to produce... 
    Work at office
    Remote work
    3 days per week

    United Cerebral Palsy of Georgia

    San Francisco, CA
    1 day ago
  • $240k - $300k

    ### Staff Economist#### San Francisco, California, United StatesStaff...  ...to create thought-leading research, craft benchmark reports, and publish...  ...‑matter experts, data scientists, and customers to surface exclusive...  ...analyzing markets and evaluating data from multiple sources... 
    Work at office
    Remote work
    Work from home
    3 days per week

    Brex Inc.

    San Francisco, CA
    4 days ago
  • $200k - $280k

     ...performance computing for ML. Are comfortable...  .... Have a solid research foundation in your area...  ...rollout collection and evaluation cheaper. Use these...  ...Establish metrics, benchmarks, and experimentation frameworks...  ...technical leadership (Staff level) Set... 
    Full time

    Together AI

    San Francisco, CA
    2 days ago
  • $150k - $250k

     ...goods, and global social organizations. We research and deploy technologies that power AI-...  ...to drive incremental improvements on benchmarks or optimize an existing process but...  ...progress is measured. Researchers design evaluation frameworks that capture reasoning depth... 
    Work at office
    3 days per week

    Distyl AI

    San Francisco, CA
    1 day ago
  •  ...care. Founded by Stanford AI scientists with deep clinical...  ...Overview We are seeking an ML Scientist (Research) to advance Knowtex's voice...  ...focuses on developing and evaluating novel machine learning approaches...  ...Transformer-based LLM architectures Triton Inference... 

    Knowtex

    San Francisco, CA
    1 day ago
  • $141.1k - $262.1k

     ...development. Roche's Research and Early...  ...(AI) to assist our scientists in both pRED and gRED...  ...machine learning (ML) techniques. We are...  ...training signals, and evaluation criteria. Evaluation & Benchmarks: Design and implement...  ...work experience. LLM Expertise: Experience... 
    Work experience placement
    Local area
    Worldwide
    Relocation package

    Genentech

    South San Francisco, CA
    4 days ago
  • $176k - $304k

     ...ML Scientist I / II, Foundation Models for Life Sciences San Francisco...  ...the Foundation Models team researches and develops large-scale...  ...formulation, model design, training, evaluation, and integration into Lila's...  ...ML tools, frameworks, or benchmark datasets for scientific... 
    Full time
    Work at office
    Local area
    Flexible hours

    Lila Sciences

    San Francisco, CA
    2 days ago
  •  ...Senior / Principal ML Scientist Merge Labs is a frontier research lab with the mission of bridging biological and artificial intelligence to maximize...  ...acquisition-strategy using internal and public datasets; benchmark and validate model performance. Integrate ML... 

    Merge Labs, Inc.

    San Francisco, CA
    3 days ago
  • $150k - $300k

     ...across time, causality, and context. As a Research Scientist, you will tackle fundamental problems in...  .... Key Responsibilities Build LLM-powered information extraction pipelines...  ...architectural exploration. ~ Strong ML and NLP foundation, particularly in information... 
    Relocation package
    Flexible hours

    Dynamis Labs

    San Francisco, CA
    1 day ago
  • Member of Technical Staff - Research Scientist Patronus AI is a frontier lab developing...  ...influential research in AI evaluation like FinanceBench , Lynx ,...  ...outcomes such as papers, benchmarks, datasets, and platform...  ...code in Python and modern ML frameworks. Ability to execute... 

    Patronus AI, Inc.

    San Francisco, CA
    3 days ago
  • $160k - $250k

    Machine Learning Researcher, Audio Location: San Francisco, CA or Remote...  ...metrics and perceptual evaluations. Validate ideas quickly through...  ...Comfortable working with both offline benchmarks and live production metrics....  ...or telephony. PhD in ML, AI, or a related field, or... 
    Work at office
    Remote work

    Bland

    San Francisco, CA
    5 days ago
  • $180k - $260k

    BLAND is seeking a Machine Learning Researcher focused on multimodal LLM technology. The role involves developing conversational AI models that integrate speech, text, and real-time reasoning. Candidates should possess a strong background in machine learning, LLMs, and... 

    BLAND

    San Francisco, CA
    5 days ago
  • $259.2k - $324k

    Staff Machine Learning Research Scientist/ Engineer, Agents Join the team shaping the future of...  ...upon our prior model evaluation work with enterprise customers...  ...published research in top ML venues (e.g., ACL, EMNLP,...  ...experience with open source LLM fine-tuning or involvement... 
    Full time

    Scale AI, Inc.

    San Francisco, CA
    2 days ago

Do you want to receive more vacancies?

Subscribe and receive similar vacancies to Staff ML Research Scientist, LLM Evaluations & Benchmarks. Be the first to apply!