Research Engineer, Benchmarking
Refresh AI
- Research Engineer, BenchmarkingEngineeringSan FranciscoFull-timeBuild the benchmarks frontier labs use to measure real-world coding and computer-use capability. Translate expert workflows into rigorous, verifiable evaluations, run them against frontier models, and publish numbers that hold up under adversarial scrutiny.Across every role at Refresh: you're willing to ship full-stack work on our core stack (Vercel, Supabase, Render) when it's needed, and you're comfortable with reinforcement learning and supervised fine-tuning at a high level.
- J-18808-Ljbffr Refresh AI
Vacancy posted 17 hours ago
Similar jobs that could be interesting for youBased on the Research Engineer, Benchmarking in San Francisco, CA vacancy
- ...committed team. You’ll work alongside researchers, operators, and AI companies at the forefront... .... About the Role As a Research Engineer at Mercor, you’ll work at the intersection... ...and applied AI research. You’ll own benchmarking pipelines, evaluation systems, and failure...SuggestedWork at office
- Refresh AI is seeking a Research Engineer in San Francisco to push the boundaries of benchmarking technology. You will build benchmarks that labs use for evaluating coding abilities and computer-use capability. Your role will require expertise in reinforcement learning...SuggestedFull time
- ...Levy, Kevin Hartz, and others. The Role: We are looking for Research Engineers to build AI systems that use agent interaction data to help... ...following: You care about data quality, evaluation, and benchmarking, and are comfortable working hands‑on with messy data You have...SuggestedImmediate start
- ...eliminating complexity and friction with seamless automation. As a Research Engineer at Capably, you’ll help define how intelligent systems... ..., reliability, and adaptability Design evaluations and benchmarks that reflect real enterprise workflows Improve model behaviour...Suggested
$180k - $270k
Research Engineer (Focused on RL) You'll bring reinforcement learning to Firecrawl's core product — building the training infrastructure,... ...whether your models actually work in production, not just on benchmarks. You've deployed models that serve real traffic and you've made...SuggestedFull timeTemporary workRemote work$160k - $240k
Research Engineer — Evals You’ll build the evaluation systems that tell us whether Firecrawl actually works. That sounds simple. It isn’t.... ...This isn’t an eval role where you inherit a framework and run benchmarks. You’ll design the metrics, build the pipelines, generate...Full timeTemporary workRemote work- ...ownership. Every applied AI company we benchmark against like Decagon, Harvey, Sierra, Cursor... ...scale, every day. We see exactly where research meets production and where the data is... ...alongside elite and competitive engineering minds. Translate findings into infrastructure...Relocation
- ...Collaborating with a diverse team, including product managers, researchers, and engineering departments, your role involves conducting research on... ...scalability, efficiency, and reliability. Implement benchmarks that evaluate quality, safety, security, and trustworthiness...
- ...consequence systems problems where the edge cases matter most. We’re looking for a Research Engineer to help define how we measure and improve model quality. You’ll build the benchmarks, datasets, tooling, and evaluation loops that tell us whether our systems are...
- ...own the quality of AI across everything Gamma creates. As our Research Engineer, you'll design evaluation frameworks that measure AI output... ...that enable rapid testing, validate changes against quality benchmarks, and ensure our AI gets smarter with every iteration. If you...Work at officeWork from home
- ...Infrastructure. About this role We’re seeking an experienced Research Engineer to join our effort in building and training AI agents for... ...with strong intuition, experience in model evaluation, and benchmarks. Reinforcement Learning experience is a plus. Your work will...Full timeWork at office
- ...Own the end-to-end lifecycle of memory features—from research to production. You’ll fine-tune models for... ...pain points into research hypotheses; implement and benchmark ideas from papers; and ship with Engineering to SOTA latency, reliability, and cost . You’ll also...
- ...these models to real-world industry and economy use cases. As a Research Engineer on our Physical AI team, you will lead pre-training and post... ...and action sequences Evaluate model performance using both benchmark datasets and real-world deployment metrics Contributions...Work at office
$170k - $230k
...Center for AI Safety (CAIS) is a leading research and advocacy organization focused on... ...Safety Action Fund. As a Senior Research Engineer here, you’ll work at the intersection of... ...design and maintenance of datasets and benchmarks. Run distributed training and evaluation...Work at officeLocal area$9.7k - $19k
...Center for AI Safety (CAIS) is a leading research and field-building organization on a... ...societal and policy solutions. As a research engineer intern here, you will work very closely... ..., machine ethics, AI alignment, and benchmarking AI risks. We will assign you a...Full timeInternshipLocal area- About the Team The Future of Computing Research team is an applied research team within... ...being. We work closely across research, engineering, design, product, and safety to define... ...‑grounded: success is not just higher benchmark performance, but better model behavior...Work at officeImmediate startRelocation package
$250k - $300k
...infrastructure that powers breakthrough AI models at leading research labs and enterprises. Since 2018, we've been pioneering data-centric... ...Your Impact Create frameworks and tools to construct, train, benchmark and evaluate autonomous agent capabilities. Design agent-...Work at officeFlexible hours2 days per week- ...AI Research Scientist We're building the first truly private, personal AI that learns your skills, judgment, and preferences without... ...quality. We're not trying to push up numbers on a public benchmark, we're trying to make models qualitatively good at understanding...Shift work
- ...California, Turing is the world’s leading research accelerator for frontier AI labs and a... ...here: Environments for Software Engineering / coding agents UI-Environments for Computer... ...of data quality Proactively build benchmarks and run evals on frontier models and coding...For contractorsFlexible hours
$315k - $340k
[Expression of Interest] Research Scientist/Engineer, Honesty About Anthropic Anthropic’s mission is to create reliable, interpretable, and steerable... ...by the model Create and maintain comprehensive honesty benchmarks and evaluation frameworks Implement techniques to ground...Full timeWork at officeVisa sponsorshipFlexible hours$220k - $280k
Job Description About the role In your role as Senior Research Engineer, you'll be at the heart of building the next generation of generative... ...based on market data, your experience levels, and internal benchmarks of your peers in the same domain and job level. #J-18808-...Work at officeLocal areaFlexible hours- ...Research Systems Engineer As a research systems engineer, you'll train frontier-scale models and develop the methods that make continual learning... ...fortunate to be backed by partners like Kleiner Perkins, Benchmark, Sequoia, Lux, and Greenoaks. Who Thrives Here: We're...Visa sponsorshipRelocation package
$150k - $250k
Robotics Research Engineer About Us We are reimagining manufacturing through advanced robotics. Our mission is to rebuild the American manufacturing... ...and fine-tune foundation models for assembly tasks Benchmark and evaluate real-world performance Controls + Learning...Full timeContract workWork at office$295k - $380k
...About The Team The team works on research and systems that advance frontier models... ...About The Role This is a systems engineering role focused on ML training infrastructure... ..., and storage. Write tests, benchmarks, and diagnostics that catch meaningful...- ...founded in 2022 to close it. Our open‑source engine, Daft, is the distributed data engine... ...district office. Your Role As a Research Engineer on the Visual Understanding team... ...perception models against customer datasets and benchmarks. Drive down per‑clip annotation cost —...Hourly payWork at officeFlexible hoursNight shift1 day per week
$300k
...shape how we work and grow as a team. About the Team The Research team at Decagon innovates on building the most advanced... ...information retrieval. We\u2019re looking for people with strong engineering skills, writing bug-free machine learning code, and building the...Work at office- ...You will consume real-world trajectories or researcher hypotheses, materialize realistic data, propose candidate tasks, benchmark those tasks against frontier computer‑use... ...intersection of empirical AI research, systems engineering, and model evaluation. You may be a strong...
- ...Valley investors. For more information, please visit Role Description We are seeking a creative, hands-on Senior Mechanical Research Engineer with significant vibration and dynamics experience to lead complex mechanical sensing problems on edge grid intelligence...
- ...Platform Research Engineer As a platform research engineer, you'll build the core AI systems that make Applied Compute's platform intelligent... ...systems Background in building evaluation frameworks, benchmarks, or data quality systems Experience with continual...Visa sponsorshipRelocation package
- DepthFirst in San Francisco is seeking an experienced Research Engineer. You will build and train AI agents for discovering and remediating... .... Responsibilities include developing evaluation benchmarks and training procedures. The ideal candidate has 3+ years in...Work at office
Do you want to receive more vacancies?
Subscribe and receive similar vacancies to Research Engineer, Benchmarking. Be the first to apply!
Related searches
- research software engineer San Francisco, CA
- research assistant engineering San Francisco, CA
- deep learning research engineer San Francisco, CA
- senior research engineer San Francisco, CA
- research programmer San Francisco, CA
- ai research engineer San Francisco, CA
- junior machine learning research engineer San Francisco, CA
- research engineer San Francisco, CA
- microsoft research San Francisco, CA
- oncology research nurse San Francisco, CA



