Sign up to access all features of our service.
  • Job search
  • Favorites
  • Create a CV
    New
  • Salaries
  • Subscriptions

Research Engineer, Benchmarking

Refresh AI

  • Research Engineer, BenchmarkingEngineeringSan FranciscoFull-timeBuild the benchmarks frontier labs use to measure real-world coding and computer-use capability. Translate expert workflows into rigorous, verifiable evaluations, run them against frontier models, and publish numbers that hold up under adversarial scrutiny.Across every role at Refresh: you're willing to ship full-stack work on our core stack (Vercel, Supabase, Render) when it's needed, and you're comfortable with reinforcement learning and supervised fine-tuning at a high level.
  • J-18808-Ljbffr Refresh AI

Vacancy posted 17 hours ago
Similar jobs that could be interesting for youBased on the Research Engineer, Benchmarking in San Francisco, CA vacancy
  •  ...committed team. You’ll work alongside researchers, operators, and AI companies at the forefront...  .... About the Role As a Research Engineer at Mercor, you’ll work at the intersection...  ...and applied AI research. You’ll own benchmarking pipelines, evaluation systems, and failure... 
    Suggested
    Work at office

    Mercor

    San Francisco, CA
    2 days ago
  • Refresh AI is seeking a Research Engineer in San Francisco to push the boundaries of benchmarking technology. You will build benchmarks that labs use for evaluating coding abilities and computer-use capability. Your role will require expertise in reinforcement learning... 
    Suggested
    Full time

    Refresh AI

    San Francisco, CA
    17 hours ago
  •  ...Levy, Kevin Hartz, and others. The Role: We are looking for Research Engineers to build AI systems that use agent interaction data to help...  ...following: You care about data quality, evaluation, and benchmarking, and are comfortable working hands‑on with messy data You have... 
    Suggested
    Immediate start

    Judgment Labs Inc.

    San Francisco, CA
    1 day ago
  •  ...eliminating complexity and friction with seamless automation. As a Research Engineer at Capably, you’ll help define how intelligent systems...  ..., reliability, and adaptability Design evaluations and benchmarks that reflect real enterprise workflows Improve model behaviour... 
    Suggested

    Capably

    San Francisco, CA
    17 hours ago
  • $180k - $270k

    Research Engineer (Focused on RL) You'll bring reinforcement learning to Firecrawl's core product — building the training infrastructure,...  ...whether your models actually work in production, not just on benchmarks. You've deployed models that serve real traffic and you've made... 
    Suggested
    Full time
    Temporary work
    Remote work

    Firecrawl

    San Francisco, CA
    17 hours ago
  • $160k - $240k

    Research Engineer — Evals You’ll build the evaluation systems that tell us whether Firecrawl actually works. That sounds simple. It isn’t....  ...This isn’t an eval role where you inherit a framework and run benchmarks. You’ll design the metrics, build the pipelines, generate... 
    Full time
    Temporary work
    Remote work

    Firecrawl

    San Francisco, CA
    3 days ago
  •  ...ownership. Every applied AI company we benchmark against like Decagon, Harvey, Sierra, Cursor...  ...scale, every day. We see exactly where research meets production and where the data is...  ...alongside elite and competitive engineering minds. Translate findings into infrastructure... 
    Relocation

    Rox Data Corp

    San Francisco, CA
    2 days ago
  •  ...Collaborating with a diverse team, including product managers, researchers, and engineering departments, your role involves conducting research on...  ...scalability, efficiency, and reliability. Implement benchmarks that evaluate quality, safety, security, and trustworthiness... 

    Kubelt

    San Francisco, CA
    2 days ago
  •  ...consequence systems problems where the edge cases matter most. We’re looking for a Research Engineer to help define how we measure and improve model quality. You’ll build the benchmarks, datasets, tooling, and evaluation loops that tell us whether our systems are... 

    Variance

    San Francisco, CA
    2 days ago
  •  ...own the quality of AI across everything Gamma creates. As our Research Engineer, you'll design evaluation frameworks that measure AI output...  ...that enable rapid testing, validate changes against quality benchmarks, and ensure our AI gets smarter with every iteration. If you... 
    Work at office
    Work from home

    gamma.app

    San Francisco, CA
    2 days ago
  •  ...Infrastructure. About this role We’re seeking an experienced Research Engineer to join our effort in building and training AI agents for...  ...with strong intuition, experience in model evaluation, and benchmarks. Reinforcement Learning experience is a plus. Your work will... 
    Full time
    Work at office

    DepthFirst

    San Francisco, CA
    1 day ago
  •  ...Own the end-to-end lifecycle of memory features—from research to production. You’ll fine-tune models for...  ...pain points into research hypotheses; implement and benchmark ideas from papers; and ship with Engineering to SOTA latency, reliability, and cost . You’ll also... 

    Mem0

    San Francisco, CA
    17 hours ago
  •  ...these models to real-world industry and economy use cases. As a Research Engineer on our Physical AI team, you will lead pre-training and post...  ...and action sequences Evaluate model performance using both benchmark datasets and real-world deployment metrics Contributions... 
    Work at office

    Hedra, Inc

    San Francisco, CA
    4 days ago
  • $170k - $230k

     ...Center for AI Safety (CAIS) is a leading research and advocacy organization focused on...  ...Safety Action Fund. As a Senior Research Engineer here, you’ll work at the intersection of...  ...design and maintenance of datasets and benchmarks. Run distributed training and evaluation... 
    Work at office
    Local area

    Center for AI Safety

    San Francisco, CA
    2 days ago
  • $9.7k - $19k

     ...Center for AI Safety (CAIS) is a leading research and field-building organization on a...  ...societal and policy solutions. As a research engineer intern here, you will work very closely...  ..., machine ethics, AI alignment, and benchmarking AI risks. We will assign you a... 
    Full time
    Internship
    Local area

    Center for AI Safety

    San Francisco, CA
    3 days ago
  • About the Team The Future of Computing Research team is an applied research team within...  ...being. We work closely across research, engineering, design, product, and safety to define...  ...‑grounded: success is not just higher benchmark performance, but better model behavior... 
    Work at office
    Immediate start
    Relocation package

    Slope

    San Francisco, CA
    2 days ago
  • $250k - $300k

     ...infrastructure that powers breakthrough AI models at leading research labs and enterprises. Since 2018, we've been pioneering data-centric...  ...Your Impact Create frameworks and tools to construct, train, benchmark and evaluate autonomous agent capabilities. Design agent-... 
    Work at office
    Flexible hours
    2 days per week

    Labelbox

    San Francisco, CA
    4 days ago
  •  ...AI Research Scientist We're building the first truly private, personal AI that learns your skills, judgment, and preferences without...  ...quality. We're not trying to push up numbers on a public benchmark, we're trying to make models qualitatively good at understanding... 
    Shift work

    Workshop Labs

    San Francisco, CA
    17 hours ago
  •  ...California, Turing is the world’s leading research accelerator for frontier AI labs and a...  ...here: Environments for Software Engineering / coding agents UI-Environments for Computer...  ...of data quality Proactively build benchmarks and run evals on frontier models and coding... 
    For contractors
    Flexible hours

    Cerebras

    San Francisco, CA
    1 day ago
  • $315k - $340k

    [Expression of Interest] Research Scientist/Engineer, Honesty About Anthropic Anthropic’s mission is to create reliable, interpretable, and steerable...  ...by the model Create and maintain comprehensive honesty benchmarks and evaluation frameworks Implement techniques to ground... 
    Full time
    Work at office
    Visa sponsorship
    Flexible hours

    Menlo Ventures

    San Francisco, CA
    1 day ago
  • $220k - $280k

    Job Description About the role In your role as Senior Research Engineer, you'll be at the heart of building the next generation of generative...  ...based on market data, your experience levels, and internal benchmarks of your peers in the same domain and job level. #J-18808-... 
    Work at office
    Local area
    Flexible hours

    black.ai

    San Francisco, CA
    2 days ago
  •  ...Research Systems Engineer As a research systems engineer, you'll train frontier-scale models and develop the methods that make continual learning...  ...fortunate to be backed by partners like Kleiner Perkins, Benchmark, Sequoia, Lux, and Greenoaks. Who Thrives Here: We're... 
    Visa sponsorship
    Relocation package

    Applied Compute

    San Francisco, CA
    3 days ago
  • $150k - $250k

    Robotics Research Engineer About Us We are reimagining manufacturing through advanced robotics. Our mission is to rebuild the American manufacturing...  ...and fine-tune foundation models for assembly tasks Benchmark and evaluate real-world performance Controls + Learning... 
    Full time
    Contract work
    Work at office

    Foundry Robotics Inc.

    San Francisco, CA
    3 days ago
  • $295k - $380k

     ...About The Team The team works on research and systems that advance frontier models...  ...About The Role This is a systems engineering role focused on ML training infrastructure...  ..., and storage. Write tests, benchmarks, and diagnostics that catch meaningful... 

    OpenAI

    San Francisco, CA
    2 days ago
  •  ...founded in 2022 to close it. Our open‑source engine, Daft, is the distributed data engine...  ...district office. Your Role As a Research Engineer on the Visual Understanding team...  ...perception models against customer datasets and benchmarks. Drive down per‑clip annotation cost —... 
    Hourly pay
    Work at office
    Flexible hours
    Night shift
    1 day per week

    Eventual

    San Francisco, CA
    3 days ago
  • $300k

     ...shape how we work and grow as a team. About the Team The Research team at Decagon innovates on building the most advanced...  ...information retrieval. We\u2019re looking for people with strong engineering skills, writing bug-free machine learning code, and building the... 
    Work at office

    Decagon

    San Francisco, CA
    17 hours ago
  •  ...You will consume real-world trajectories or researcher hypotheses, materialize realistic data, propose candidate tasks, benchmark those tasks against frontier computer‑use...  ...intersection of empirical AI research, systems engineering, and model evaluation. You may be a strong... 

    Plato

    San Francisco, CA
    4 days ago
  •  ...Valley investors. For more information, please visit  Role Description We are seeking a creative, hands-on Senior Mechanical Research Engineer with significant vibration and dynamics experience to lead complex mechanical sensing problems on edge grid intelligence... 

    Gridware

    San Francisco, CA
    7 days ago
  •  ...Platform Research Engineer As a platform research engineer, you'll build the core AI systems that make Applied Compute's platform intelligent...  ...systems Background in building evaluation frameworks, benchmarks, or data quality systems Experience with continual... 
    Visa sponsorship
    Relocation package

    Applied Compute

    San Francisco, CA
    3 days ago
  • DepthFirst in San Francisco is seeking an experienced Research Engineer. You will build and train AI agents for discovering and remediating...  .... Responsibilities include developing evaluation benchmarks and training procedures. The ideal candidate has 3+ years in... 
    Work at office

    DepthFirst

    San Francisco, CA
    1 day ago

Do you want to receive more vacancies?

Subscribe and receive similar vacancies to Research Engineer, Benchmarking. Be the first to apply!