Staff ML Research Scientist, LLM Evaluations & Benchmarks

Scale AI, Inc.

A leading AI evaluation company is looking for a Staff Machine Learning Research Scientist to advance LLM evaluation methodologies. This role involves designing benchmarks, collaborating with teams, and mentoring others. Ideal candidates have significant experience in NLP and have published research in major conferences. Competitive compensation includes salary, equity, and comprehensive benefits. Preferred location is San Francisco. #J-18808-Ljbffr Scale AI, Inc.

Apply

Vacancy posted 3 days ago

Similar jobs that could be interesting for youBased on the Staff ML Research Scientist, LLM Evaluations & Benchmarks in San Francisco, CA vacancy

Staff Machine Learning Research Scientist, LLM Evals
$264.8k - $331k
...leading data and evaluation partner for... ...evaluation and benchmarking of large language... ...industry-leading LLM evals, setting... .... Our Research teams work with... ...research. As a Staff Machine Learning Research Scientist on the LLM Evals... ...pipelines using modern ML frameworks....
Suggested
Full time
DiversityJobs Inc
San Francisco, CA
13 days ago
ML Scientist, AI Evaluation & Benchmarking
A leading AI evaluation firm based in San Francisco seeks a Machine Learning Scientist to foster understanding of AI model performance. You'll engage in designing and analyzing comprehensive experiments while collaborating across teams. Applicants should possess a PhD in...
Suggested
Arena Intelligence, Inc.
San Francisco, CA
4 days ago
Research Scientist, Frontier Risk Evaluations
$197.4k - $246.75k
Scale Labs, Research Scientist — Frontier Risk Evaluations As the leading data and evaluation partner... ...and instrumenting ML pipelines, writing evaluation... ...crafting evaluations and benchmarks, or a background in data science roles related to LLM technologies. Experience...
Suggested
Full time
Scale AI, Inc.
San Francisco, CA
4 days ago
Senior / Staff Machine Learning Research Scientist, Agents
$302.4k - $378k
...building upon our prior model evaluation work with enterprise... ...team, part of Scale's Research organization, brings... ...and RL reward signals, benchmarking autonomous agent performance... ...not only expertise in LLM agents and planning... ...published research in top ML venues (e.g., ACL,...
Suggested
Full time
Scale AI
San Francisco, CA
6 days ago
Machine Learning Researcher
$250k - $350k
...of what's possible in LLM post-training. If you love... ..., and turning research insights into products... ...distillation, training, evaluation, and planet-scale hosting... ...settings Create robust benchmarks and evaluation frameworks... ...performance Stay current with ML research and identify...
Suggested
Work at office
Inference
San Francisco, CA
1 day ago
Machine Learning Research Scientist, Reasoning
$252k - $315k
...Machine Learning Research Scientist, Reasoning San Francisco, CA; Seattle... ...Building on our history of model evaluation with enterprise and... ...types critical for advancing LLM-based agents, including browser... ...of published research in top ML and NLP venues (e.g., ACL, EMNLP...
Full time
Scale AI
San Francisco, CA
5 days ago
Research Scientist: Post-Training
...Staff / Principal ML Training Systems Engineer We are building... ...heavily in research, infrastructure, and... ...Develop automated benchmarking and regression detection... ...directly with research scientists and ML engineers in... ...variable-length data Evaluation cadence and rollout...
Seer
San Francisco, CA
5 hours ago
AI Researcher
$250k - $325k
...RAG [2023] Large-scale LLM‑based legal fact... ...What, and Who Why: AI Researchers are the engine of innovation... ...legal‑specific tasks. Evaluate emerging work in agentic... ...and maintain datasets, benchmarks, and evals for training... ...Publications at top venues—e.g., ML/AI conferences (NeurIPS...
Contract work
Immediate start
Ivo
San Francisco, CA
1 day ago
Research Scientist: AI Environments & Benchmarks
Fleet AI, Inc. is seeking a Research Scientist to join their core research team in San Francisco. This role focuses on investigating... ...with leading labs. Key responsibilities include generating benchmarks to evaluate frontier models, automating environment construction for...
Fleet AI, Inc.
San Francisco, CA
1 day ago
Research Scientist — Neuro-Symbolic AI
...agents, and we are hiring a Research Scientist to advance the neuro-symbolic... .... Publish at top AI and ML conferences (NeurIPS, ICML,... ...with full observability and benchmarking. Engage with the Bay Area academic... ..., etc.). Familiarity with LLM reasoning, retrieval-augmented...
Work at office
Relocation
Rippletide SAS
San Francisco, CA
1 day ago
Research Scientist, AI Controls and Monitoring
$197.4k - $246.75k
Scale Labs, Research Scientist — AI Controls and Monitoring As the leading data and evaluation partner for frontier AI companies, Scale plays... ...to establish standards and benchmarks for AI monitoring and... ...experience addressing sophisticated ML problems, whether in a...
Scale AI, Inc.
San Francisco, CA
3 days ago
[Expression of Interest] Research Scientist / Engineer, Honesty
$315k - $340k
[Expression of Interest] Research Scientist/Engineer, Honesty About Anthropic... ...comprehensive honesty benchmarks and evaluation frameworks Implement techniques... .../PhD in Computer Science, ML, or related field Possess... ...: Currently, we expect all staff to be in one of our offices...
Full time
Work at office
Visa sponsorship
Flexible hours
Menlo Ventures
San Francisco, CA
3 days ago
Research Engineer/Research Scientist, Audio
$350k
...growing group of committed researchers, engineers, policy... ...full stack of audio ML, developing audio codecs... ..., from performance benchmarking to kernel optimization... ...audio Creating robust evaluation methodologies for hard... ...Currently, we expect all staff to be in one of our...
Full time
Work at office
Visa sponsorship
Flexible hours
Shift work
Menlo Ventures
San Francisco, CA
5 days ago
Research Engineer / Scientist, Alignment Science
$280k
...group of committed researchers, engineers, policy... ...yourself as both a scientist and an engineer. As... ...tooling to efficiently evaluate the effectiveness of novel LLM-generated... ...significant software, ML, or research engineering... ..., we expect all staff to be in one of our...
Contract work
For contractors
For subcontractor
Work at office
Relocation
Visa sponsorship
Work visa
Flexible hours
Menlo Ventures
San Francisco, CA
1 day ago
Lead AI Research Scientist
$357k
...Responsibilities Workato's AI Research Lab is seeking an... ...Lead AI Research Scientist to join our growing team... ...workflow graphs, and agent evaluation frameworks, while... ...PyTorch or JAX and modern LLM frameworks. ~ Proven... ...publication record in top-tier ML venues (NeurIPS, ICML,...
Work at office
Remote work
Flexible hours
Workato
San Francisco, CA
13 days ago
Member of Technical Staff, Senior/Staff MLE
...Cohere is a team of researchers, engineers, designers,... ...not a typical "Applied Scientist" or "ML Engineer" role. As a Member of Technical Staff, Applied ML, you will:... ...domains, design custom LLM solutions, and deliver... ...agent integrations, model evaluations, and SOTA modeling...
Full time
Work at office
Remote work
Flexible hours
Cohere
San Francisco, CA
1 day ago
Principal AI Security & Risk Researcher
...Principal AI Security & Risk Researcher to join our founding research... ...methodologies Stay current with LLM vulnerabilities, adversarial techniques... ...agentic systems Develop risk evaluation methodologies that adapt as... ..., with 2+ years focused on AI/ML security, red teaming, or...
Part time
Remote work
Flexible hours
Ciph Lab
San Francisco, CA
4 days ago
Staff Economist. Job in San Francisco LilyLifestyle Jobs
$240k - $300k
...Data teams to create thought‑leading research, craft benchmark reports, and publish insights that make... ...subject‑matter experts, data scientists, and customers to surface exclusive insights... ...Experience analyzing markets and evaluating data from multiple sources to produce...
Work at office
Remote work
3 days per week
United Cerebral Palsy of Georgia
San Francisco, CA
1 day ago
Staff Economist San Francisco, California, United States
$240k - $300k
### Staff Economist#### San Francisco, California, United StatesStaff... ...to create thought-leading research, craft benchmark reports, and publish... ...‑matter experts, data scientists, and customers to surface exclusive... ...analyzing markets and evaluating data from multiple sources...
Work at office
Remote work
Work from home
3 days per week
Brex Inc.
San Francisco, CA
4 days ago
AI Researcher, Core ML (Turbo)
$200k - $280k
...performance computing for ML. Are comfortable... .... Have a solid research foundation in your area... ...rollout collection and evaluation cheaper. Use these... ...Establish metrics, benchmarks, and experimentation frameworks... ...technical leadership (Staff level) Set...
Full time
Together AI
San Francisco, CA
2 days ago
Applied AI Researcher, Benchmarking
$150k - $250k
...goods, and global social organizations. We research and deploy technologies that power AI-... ...to drive incremental improvements on benchmarks or optimize an existing process but... ...progress is measured. Researchers design evaluation frameworks that capture reasoning depth...
Work at office
3 days per week
Distyl AI
San Francisco, CA
1 day ago
ML Scientist (Research)
...care. Founded by Stanford AI scientists with deep clinical... ...Overview We are seeking an ML Scientist (Research) to advance Knowtex's voice... ...focuses on developing and evaluating novel machine learning approaches... ...Transformer-based LLM architectures Triton Inference...
Knowtex
San Francisco, CA
1 day ago
Machine Learning Scientist, Scientific Reasoning Models, AI for Drug Discovery
$141.1k - $262.1k
...development. Roche's Research and Early... ...(AI) to assist our scientists in both pRED and gRED... ...machine learning (ML) techniques. We are... ...training signals, and evaluation criteria. Evaluation & Benchmarks: Design and implement... ...work experience. LLM Expertise: Experience...
Work experience placement
Local area
Worldwide
Relocation package
Genentech
South San Francisco, CA
4 days ago
ML Scientist I / II, Foundation Models for Life Sciences
$176k - $304k
...ML Scientist I / II, Foundation Models for Life Sciences San Francisco... ...the Foundation Models team researches and develops large-scale... ...formulation, model design, training, evaluation, and integration into Lila's... ...ML tools, frameworks, or benchmark datasets for scientific...
Full time
Work at office
Local area
Flexible hours
Lila Sciences
San Francisco, CA
2 days ago
ML Research Scientist - Bayesian Optimization
...Senior / Principal ML Scientist Merge Labs is a frontier research lab with the mission of bridging biological and artificial intelligence to maximize... ...acquisition-strategy using internal and public datasets; benchmark and validate model performance. Integrate ML...
Merge Labs, Inc.
San Francisco, CA
3 days ago
Machine Learning Research Scientist
$150k - $300k
...across time, causality, and context. As a Research Scientist, you will tackle fundamental problems in... .... Key Responsibilities Build LLM-powered information extraction pipelines... ...architectural exploration. ~ Strong ML and NLP foundation, particularly in information...
Relocation package
Flexible hours
Dynamis Labs
San Francisco, CA
1 day ago
Member of Technical Staff - Research Scientist
Member of Technical Staff - Research Scientist Patronus AI is a frontier lab developing... ...influential research in AI evaluation like FinanceBench , Lynx ,... ...outcomes such as papers, benchmarks, datasets, and platform... ...code in Python and modern ML frameworks. Ability to execute...
Patronus AI, Inc.
San Francisco, CA
3 days ago
Machine Learning Researcher, Audio
$160k - $250k
Machine Learning Researcher, Audio Location: San Francisco, CA or Remote... ...metrics and perceptual evaluations. Validate ideas quickly through... ...Comfortable working with both offline benchmarks and live production metrics.... ...or telephony. PhD in ML, AI, or a related field, or...
Work at office
Remote work
Bland
San Francisco, CA
5 days ago
Multimodal LLM Researcher: Real-Time Speech & Tools
$180k - $260k
BLAND is seeking a Machine Learning Researcher focused on multimodal LLM technology. The role involves developing conversational AI models that integrate speech, text, and real-time reasoning. Candidates should possess a strong background in machine learning, LLMs, and...
BLAND
San Francisco, CA
5 days ago
Machine Learning Research Scientist/ Engineer, Agents Research San Francisco, CA
$259.2k - $324k
Staff Machine Learning Research Scientist/ Engineer, Agents Join the team shaping the future of... ...upon our prior model evaluation work with enterprise customers... ...published research in top ML venues (e.g., ACL, EMNLP,... ...experience with open source LLM fine-tuning or involvement...
Full time
Scale AI, Inc.
San Francisco, CA
2 days ago

Do you want to receive more vacancies?

Subscribe and receive similar vacancies to Staff ML Research Scientist, LLM Evaluations & Benchmarks. Be the first to apply!