Research Engineer, Benchmarking

Refresh AI

San Francisco, CA

Research Engineer, BenchmarkingEngineeringSan FranciscoFull-timeBuild the benchmarks frontier labs use to measure real-world coding and computer-use capability. Translate expert workflows into rigorous, verifiable evaluations, run them against frontier models, and publish numbers that hold up under adversarial scrutiny.Across every role at Refresh: you're willing to ship full-stack work on our core stack (Vercel, Supabase, Render) when it's needed, and you're comfortable with reinforcement learning and supervised fine-tuning at a high level.
J-18808-Ljbffr Refresh AI

Apply

Vacancy posted 1 day ago

Similar jobs that could be interesting for youBased on the Research Engineer, Benchmarking in San Francisco, CA vacancy

Research Engineer - Benchmarking, Evals & Failure Analysis
...committed team. You’ll work alongside researchers, operators, and AI companies at the forefront... .... About the Role As a Research Engineer at Mercor, you’ll work at the intersection... ...and applied AI research. You’ll own benchmarking pipelines, evaluation systems, and failure...
Suggested
Work at office
Mercor
San Francisco, CA
3 days ago
Benchmarking Research Engineer: Frontier Model Evaluations
Refresh AI is seeking a Research Engineer in San Francisco to push the boundaries of benchmarking technology. You will build benchmarks that labs use for evaluating coding abilities and computer-use capability. Your role will require expertise in reinforcement learning...
Suggested
Full time
Refresh AI
San Francisco, CA
1 day ago
Research Engineer
...eliminating complexity and friction with seamless automation. As a Research Engineer at Capably, you’ll help define how intelligent systems... ..., reliability, and adaptability Design evaluations and benchmarks that reflect real enterprise workflows Improve model behaviour...
Suggested
Capably
San Francisco, CA
1 day ago
Research Engineer — Reinforcement Learning
$180k - $270k
Research Engineer (Focused on RL) You'll bring reinforcement learning to Firecrawl's core product — building the training infrastructure,... ...whether your models actually work in production, not just on benchmarks. You've deployed models that serve real traffic and you've made...
Suggested
Full time
Temporary work
Remote work
Firecrawl
San Francisco, CA
1 day ago
Research Engineer
...Levy, Kevin Hartz, and others. The Role: We are looking for Research Engineers to build AI systems that use agent interaction data to help... ...following: You care about data quality, evaluation, and benchmarking, and are comfortable working hands‑on with messy data You have...
Suggested
Immediate start
Judgment Labs Inc.
San Francisco, CA
2 days ago
Research Engineer - Evals
$160k - $240k
Research Engineer — Evals You’ll build the evaluation systems that tell us whether Firecrawl actually works. That sounds simple. It isn’t.... ...This isn’t an eval role where you inherit a framework and run benchmarks. You’ll design the metrics, build the pipelines, generate...
Full time
Temporary work
Remote work
Firecrawl
San Francisco, CA
4 days ago
Founding Applied Research Engineer
...ownership. Every applied AI company we benchmark against like Decagon, Harvey, Sierra, Cursor... ...scale, every day. We see exactly where research meets production and where the data is... ...alongside elite and competitive engineering minds. Translate findings into infrastructure...
Relocation
Rox Data Corp
San Francisco, CA
3 days ago
Research Engineer
...own the quality of AI across everything Gamma creates. As our Research Engineer, you'll design evaluation frameworks that measure AI output... ...that enable rapid testing, validate changes against quality benchmarks, and ensure our AI gets smarter with every iteration. If you...
Work at office
Work from home
gamma.app
San Francisco, CA
3 days ago
Senior Research Engineer
...Own the end-to-end lifecycle of memory features—from research to production. You’ll fine-tune models for... ...pain points into research hypotheses; implement and benchmark ideas from papers; and ship with Engineering to SOTA latency, reliability, and cost . You’ll also...
Mem0
San Francisco, CA
1 day ago
Research Engineer, Evals
...consequence systems problems where the edge cases matter most. We’re looking for a Research Engineer to help define how we measure and improve model quality. You’ll build the benchmarks, datasets, tooling, and evaluation loops that tell us whether our systems are...
Variance
San Francisco, CA
3 days ago
Research Engineer
...Infrastructure. About this role We’re seeking an experienced Research Engineer to join our effort in building and training AI agents for... ...with strong intuition, experience in model evaluation, and benchmarks. Reinforcement Learning experience is a plus. Your work will...
Full time
Work at office
DepthFirst
San Francisco, CA
2 days ago
Senior Research Engineer
$170k - $230k
...Center for AI Safety (CAIS) is a leading research and advocacy organization focused on... ...Safety Action Fund. As a Senior Research Engineer here, you’ll work at the intersection of... ...design and maintenance of datasets and benchmarks. Run distributed training and evaluation...
Work at office
Local area
Center for AI Safety
San Francisco, CA
3 days ago
Research Engineer Intern (Fall 2026)
$9.7k - $19k
...Center for AI Safety (CAIS) is a leading research and field-building organization on a... ...societal and policy solutions. As a research engineer intern here, you will work very closely... ..., machine ethics, AI alignment, and benchmarking AI risks. We will assign you a...
Full time
Internship
Local area
Center for AI Safety
San Francisco, CA
4 days ago
Research Engineer
...these models to real-world industry and economy use cases. As a Research Engineer on our Physical AI team, you will lead pre-training and post... ...and action sequences Evaluate model performance using both benchmark datasets and real-world deployment metrics Contributions...
Work at office
Hedra, Inc
San Francisco, CA
5 days ago
Research Engineer - Machine Learning (ML)
...Collaborating with a diverse team, including product managers, researchers, and engineering departments, your role involves conducting research on... ...scalability, efficiency, and reliability. Implement benchmarks that evaluate quality, safety, security, and trustworthiness...
Kubelt
San Francisco, CA
3 days ago
Applied Research Engineer (Agents)
$250k - $300k
...infrastructure that powers breakthrough AI models at leading research labs and enterprises. Since 2018, we've been pioneering data-centric... ...Your Impact Create frameworks and tools to construct, train, benchmark and evaluate autonomous agent capabilities. Design agent-...
Work at office
Flexible hours
2 days per week
Labelbox
San Francisco, CA
5 days ago
Research Engineer/Scientist - Human Alignment, Consumer Devices
About the Team The Future of Computing Research team is an applied research team within... ...being. We work closely across research, engineering, design, product, and safety to define... ...‑grounded: success is not just higher benchmark performance, but better model behavior...
Work at office
Immediate start
Relocation package
Slope
San Francisco, CA
3 days ago
[Expression of Interest] Research Scientist / Engineer, Honesty
$315k - $340k
...a whole. Our team is a quickly growing group of committed researchers, engineers, policy experts, and business leaders working together to build... ...by the model. Create and maintain comprehensive honesty benchmarks and evaluation frameworks. Implement techniques to ground...
Work at office
Visa sponsorship
Flexible hours
Anthropic
San Francisco, CA
2 days ago
Senior Research Engineer - Video Agents
$220k - $280k
Job Description About the role In your role as Senior Research Engineer, you'll be at the heart of building the next generation of generative... ...based on market data, your experience levels, and internal benchmarks of your peers in the same domain and job level. #J-18808-...
Work at office
Local area
Flexible hours
black.ai
San Francisco, CA
3 days ago
Principal Research Engineer - Code
...California, Turing is the world’s leading research accelerator for frontier AI labs and a... ...here: Environments for Software Engineering / coding agents UI-Environments for Computer... ...of data quality Proactively build benchmarks and run evals on frontier models and coding...
For contractors
Flexible hours
Cerebras
San Francisco, CA
2 days ago
Machine Learning Research Engineer
...AI Research Scientist We're building the first truly private, personal AI that learns your skills, judgment, and preferences without... ...quality. We're not trying to push up numbers on a public benchmark, we're trying to make models qualitatively good at understanding...
Shift work
Workshop Labs
San Francisco, CA
1 day ago
Research Systems Engineer
...Research Systems Engineer As a research systems engineer, you'll train frontier-scale models and develop the methods that make continual learning... ...fortunate to be backed by partners like Kleiner Perkins, Benchmark, Sequoia, Lux, and Greenoaks. Who Thrives Here: We're...
Visa sponsorship
Relocation package
Applied Compute
San Francisco, CA
4 days ago
Robotics Research Engineer
$150k - $250k
Robotics Research Engineer About Us We are reimagining manufacturing through advanced robotics. Our mission is to rebuild the American manufacturing... ...and fine-tune foundation models for assembly tasks Benchmark and evaluate real-world performance Controls + Learning...
Full time
Contract work
Work at office
Foundry Robotics Inc.
San Francisco, CA
4 days ago
Research Infrastructure Engineer, Training Systems
$295k - $380k
...About The Team The team works on research and systems that advance frontier models... ...About The Role This is a systems engineering role focused on ML training infrastructure... ..., and storage. Write tests, benchmarks, and diagnostics that catch meaningful...
OpenAI
San Francisco, CA
3 days ago
Research Engineer, Multimodal Data
...founded in 2022 to close it. Our open‑source engine, Daft, is the distributed data engine... ...district office. Your Role As a Research Engineer on the Visual Understanding team... ...perception models against customer datasets and benchmarks. Drive down per‑clip annotation cost —...
Hourly pay
Work at office
Flexible hours
Night shift
1 day per week
Eventual
San Francisco, CA
4 days ago
Member of Technical Staff, Research Engineer
...You will consume real-world trajectories or researcher hypotheses, materialize realistic data, propose candidate tasks, benchmark those tasks against frontier computer‑use... ...intersection of empirical AI research, systems engineering, and model evaluation. You may be a strong...
Plato
San Francisco, CA
5 days ago
Staff Research Engineer
$300k
...shape how we work and grow as a team. About the Team The Research team at Decagon innovates on building the most advanced... ...information retrieval. We\u2019re looking for people with strong engineering skills, writing bug-free machine learning code, and building the...
Work at office
Decagon
San Francisco, CA
1 day ago
Senior Research Engineer, Structural Dynamics & Vibrations
...Valley investors. For more information, please visit Role Description We are seeking a creative, hands-on Senior Mechanical Research Engineer with significant vibration and dynamics experience to lead complex mechanical sensing problems on edge grid intelligence...
Gridware
San Francisco, CA
8 days ago
Platform Research Engineer
...Platform Research Engineer As a platform research engineer, you'll build the core AI systems that make Applied Compute's platform intelligent... ...systems Background in building evaluation frameworks, benchmarks, or data quality systems Experience with continual...
Visa sponsorship
Relocation package
Applied Compute
San Francisco, CA
4 days ago
AI Security Research Engineer: Vulnerability Discovery
DepthFirst in San Francisco is seeking an experienced Research Engineer. You will build and train AI agents for discovering and remediating... .... Responsibilities include developing evaluation benchmarks and training procedures. The ideal candidate has 3+ years in...
Work at office
DepthFirst
San Francisco, CA
2 days ago

Do you want to receive more vacancies?

Subscribe and receive similar vacancies to Research Engineer, Benchmarking. Be the first to apply!