Research, Evals

Exa Labs

Exa ML Evaluations Engineer

Exa is an applied AI lab building a search engine unlike the world has ever seen. We build massive-scale infra to crawl the entire web, train state-of-the-art embedding models to process it, and design super high performant vector databases to retrieve over it. We now power search for Cursor, Cognition, HubSpot, and over 400,000 developers and have raised $350m from Lightspeed, Benchmark, and a16z.

Our ultimate goal is to build perfect search over all the world's information, far beyond Google. If you want to build massive-scale ML systems that will define the way the new AI world consumes information, this is the place for you.

Research at Exa

The ML organization sits at the heart of our mission. We train foundational models for search. Our goal is to build systems that can instantly filter the world's knowledge to exactly what you want, no matter how complex your query. Basically, put the web into an extremely powerful database.

And to do that well, we need to measure what "good search" actually means. That's where you come in.

We're looking for an ML evals engineer to design and build our eval stack at Exa. The role involves investigating how to evaluate search engines in an LLM world and then building the most comprehensive, creative, and effective eval suite. You will be deciding the future of search through the evals we choose to optimize for - your work will directly influence what the research team works on and shape the direction of the company.

Who You Are

Have hands-on ML experience (training, finetuning, or evaluating models (bonus if related to embeddings or LLMs)
Have strong engineering fundamentals and can build reliable systems (Python, Rust, distributed pipelines, GPU/cluster jobs, etc.)
Enjoy diving into data via building eval sets, inspecting edge cases, designing creative measurement strategies

What You Could Do

Write a manifesto of what perfect search means
Design and implement evaluation frameworks that probe the limits of search
Build scalable, reliable eval pipelines that track regressions, drift, and quality signals across billions of documents
Create golden datasets, synthetic benchmarks, agentic tasks, and real-world test suites that reflect how developers, agents, and humans actually use Exa
Partner closely with ML researchers, data engineers, infra engineers, and product to shape the feedback loops that improve our search models

Logistics

Location: This is an in-person opportunity in San Francisco.
Visas: We're happy to sponsor international candidates (e.g., STEM OPT, OPT, H1B, O1, E3). While we cannot guarantee your visa, we have historically been successful in sponsoring candidates from all over the world. If you receive an offer, our team will work hard to get you a visa.
Benefits: We offer premium healthcare benefits (medical, dental, vision), fertility benefits, 16 weeks of fully paid parental leave for all new parents, and a monthly wellness stipend to all of our employees.

Apply

Vacancy posted 4 days ago

Similar jobs that could be interesting for youBased on the Research, Evals in San Francisco, CA vacancy

Staff Machine Learning Research Scientist, LLM Evals
$240k - $380k
...Staff Machine Learning Research Scientist, LLM Evals Ready to Apply? Join the team shaping the future of AI at Scale. As the leading data and evaluation partner for frontier AI companies, Scale is dedicated to advancing the evaluation and benchmarking of large language...
Suggested
Full time
Scale AI
San Francisco, CA
4 days ago
Member of Technical Staff (Data Scientist, Evals)
$210k - $385k
...cover our use cases. In this role, you will build specialized evals to improve answer quality across Perplexity, covering search-based... ...or consumer apps, with real user traffic at scale A strong research background, with experience applying research methods to real-world...
Suggested
Full time
Local area
Pantera Capital
San Francisco, CA
4 days ago
Applied Research - Evals & Data
$150k - $300k
...looking for people who want to build at the intersection of frontier research and real infrastructure. We recently raised $15mm in funding... ...in the Field: Rapidly designing and deploying agents, evals, and harnesses alongside customers to validate solutions. Using...
Suggested
Remote work
Visa sponsorship
Relocation package
Flexible hours
Prime Intellect
San Francisco, CA
4 days ago
Research Program Manager - Model Evals and Safety
...individuals, agents, enterprises, and even nation states. Our team of AI researchers and company builders come from DeepMind, OpenAI, Google Brain,... .... This is a foundational role. Reflection is building model evals and safety from the ground up, and this RPM will be at the...
Suggested
Relocation package
Reflection AI, Inc
San Francisco, CA
3 days ago
Research Engineer - Evals
.... Why AGI, Inc. We’re a stealth team of elite founders and AI researchers, with backgrounds spanning Stanford, OpenAI, and DeepMind . We’... ...behind one question: did this actually get better? Without a strong evals function, the lab ships vibes. With one, every training run,...
Suggested
Relocation package
AGI, Inc.
San Francisco, CA
16 hours ago
Research Engineer - Benchmarking, Evals & Failure Analysis
...ambitious, fast-paced and deeply committed team. You’ll work alongside researchers, operators, and AI companies at the forefront of shaping the... ...behavior, and real-world reasoning. You’ll design and run evals, build rubrics and scorers, and turn failure analysis into actionable...
Work at office
Mercor
San Francisco, CA
2 days ago
Research Engineer - Evals
$160k - $240k
Research Engineer — Evals Location: San Francisco, CA (Hybrid) OR Remote (Americas, UTC-3 to UTC-10) Employment Type: Full time Department: Engineering Team Compensation: $160K - $240K • 0.01% - 0.10% Overview You'll build the evaluation systems that tell us whether Firecrawl...
Full time
Temporary work
Work at office
Remote work
AI Chopping Block, Inc.
San Francisco, CA
2 days ago
Research Engineer, Evals
...problems where the edge cases matter most. We’re looking for a Research Engineer to help define how we measure and improve model... ...identity, and risk workflows Design and run offline and online evals that measure model performance on real customer tasks, not just...
Variance
San Francisco, CA
2 days ago
Research Product Manager
$175k
...Research Product Manager San Francisco Thinking Machines Lab's mission is to empower humanity through advancing collaborative general... ...or frontier models, with contributions to areas like evals, multimodality, human-ai interaction, post-training, pre-training...
Local area
Immediate start
Visa sponsorship
Work visa
Relocation package
Thinking Machines Lab
San Francisco, CA
2 days ago
Founding Research Scientist, Human Simulation
...Founding Research Scientist, Human Simulation TL;DR: Listen is building the human layer of AI: a preference model trained on millions... ...preferences shift. You make research real. You can train models, write evals, and collaborate with our research engineers to put the model...
Flexible hours
Shift work
AI Chopping Block, Inc.
San Francisco, CA
1 day ago
Staff AI Scientist
...Will Do Define the personalization tech strategy: Set the research and modeling agenda for how Oura represents users, retrieves... ...inform the next decision — you will build lightweight offline evals and shadow-mode testing infrastructure that let the team iterate...
Temporary work
Work at office
Flexible hours
2 days per week
3 days per week
Oura
San Francisco, CA
4 days ago
Research Fellow - Genetics
$132.5k - $162k
...emerging markets. Look into new datasets, techniques and insights, and use these to create new alpha sources, cooperating with expert researchers and quantitative portfolio managers. Conduct long term projects in independent research of a quality publishable in practitioner...
Apprenticeship
Work at office
Local area
Work from home
Flexible hours
1 day per week
BlackRock Services
San Francisco, CA
6 days ago
Research Economist, Economic Research
$275k - $370k
...We want AI to be safe and beneficial for our users and for society as a whole. Our team is a quickly growing group of committed researchers, engineers, policy experts, and business leaders working together to build beneficial AI systems. About the Role As an Economist...
Work at office
Visa sponsorship
Flexible hours
Menlo Ventures
San Francisco, CA
1 day ago
Research, Post-Training
$350k
...Research, Post-Training Thinking Machines Lab's mission is to empower humanity through advancing collaborative general intelligence... ...Measure how recipe choices affect various metrics. Iterate on evals: post-training involves a never-ending loop of defining a set...
Local area
Immediate start
Visa sponsorship
Work visa
Relocation package
Thinking Machines Lab
San Francisco, CA
3 days ago
Post-Training Applied Researcher
...OpenEvidence, Abridge, Clay, Gamma and Writer. By uniting applied AI research, flexible infrastructure, and seamless developer tooling, we... ...data generation. Build task‑specific training environments and evals tailored to customer domains like healthcare, code generation,...
Flexible hours
Shift work
BaseTen
San Francisco, CA
2 days ago
Applied Research - Forward-Deployed
$150k - $300k
...looking for people who want to build at the intersection of frontier research and real infrastructure. We recently raised $15mm in funding... ...quality Stay current on the frontier of agentic AI, evals, and post-training methods, and bring that knowledge directly into...
Remote work
Visa sponsorship
Relocation package
Flexible hours
Prime Intellect
San Francisco, CA
4 days ago
Research Scientist
$225k - $400k
...Founding Research Scientist ABOUT THE ROLE This is a research-driven, high-impact role for ML researchers who want to push the boundaries of real-time AI. As a Founding Machine Learning Researcher, you’ll focus on advancing model capabilities for human-like voice...
H1b
Relocation
Visa sponsorship
kadence
San Francisco, CA
4 days ago
Research Program Manager
$120k - $200k
...ambitious, fast-paced and deeply committed team. You'll work alongside researchers, operators, and AI companies at the forefront of shaping the... ...community pages (e.g., Hugging Face) Run and monitor new evals Support with marketing for benchmarks and evals, including...
Work at office
Relocation package
Mercor Alabaster
San Francisco, CA
4 days ago
Biological Safety Research Scientist
$245k - $285k
...society as a whole. Our team is a quickly growing group of committed researchers, engineers, policy experts, and business leaders working... ...this role, you will: Design and execute capability evaluations ("evals") to assess the capabilities of new models Collaborate closely...
Full time
Work at office
Visa sponsorship
Flexible hours
Shift work
Anthropic
San Francisco, CA
3 days ago
Technical Program Manager, Research
$365k
...society as a whole. Our team is a quickly growing group of committed researchers, engineers, policy experts, and business leaders working... ...teams need most. You'll move across research areas like compute, evals, RL environments, and emerging research initiatives, going deep...
Work at office
Visa sponsorship
Flexible hours
Shift work
Anthropic
San Francisco, CA
4 days ago
Research Product Manager — Structured AI Systems
$160k - $250k
About Granica Granica is an AI research and infrastructure company focused on reliable, steerable representations for enterprise data.... ...or direct contributions to AI research (e.g., modeling, data, evals, systems, or related areas). Experience supporting research in...
Work at office
Relocation package
Flexible hours
Granica
San Francisco, CA
4 days ago
Research, Post-Training Data
$350k
...and Segment Anything. About the Role The role of post‑training researchers sits at the core of our roadmap. This is the critical bridge... ...improve reasoning, truthfulness, and helpfulness. Iterate on evals: post‑training involves a never‑ending loop of defining a set of...
Local area
Visa sponsorship
Work visa
Relocation package
Thinking Machines
San Francisco, CA
16 hours ago
Product Manager, AI Research
$171k - $235k
...small, flat, highly collaborative team of experienced PMs, AI researchers, engineers, designers, and marketers. This is an opportunity to... ...Productionizing 1P models for specific use-cases. MLOps infrastructure (evals framework, inference infra, training infra, data pipelines)....
Temporary work
Work at office
Remote work
Flexible hours
Descript
San Francisco, CA
4 days ago
Senior / Staff Applied Scientist
$146k - $280k
Applied Data Scientist – Senior Technical Role We are looking for a highly experienced Applied Data Scientist to shape evaluation methodologies for autonomous driving technology. This senior technical role sits at the intersection of Evaluation, Systems & Safety, and the...
Full time
Work at office
Work from home
Flexible hours
Waabi
San Francisco, CA
1 day ago
Staff ML Research Scientist, LLM Evaluations & Benchmarks
...A leading AI evaluation company is looking for a Staff Machine Learning Research Scientist to advance LLM evaluation methodologies. This role involves designing benchmarks, collaborating with teams, and mentoring others. Ideal candidates have significant experience in...
Scale AI
San Francisco, CA
1 day ago
Research Engineer, Frontier Evals - Finance
$310k - $380k
...About the team The Frontier Evals team builds north star model evaluations to drive progress towards safe AGI/ASI. This team builds... ...the team for you. About you We are seeking exceptional research engineers that can push the boundaries of our frontier models in...
Work at office
Local area
Relocation package
Flexible hours
OpenAI
San Francisco, CA
more than 2 months ago
Senior / Staff Robotics Research Scientist - Dexterous & Mobile Manipulation
$160k - $210k
...Multiply Labs, we are applying AI to solve complex manipulation challenges in robotic cell therapy manufacturing. We\'re looking for a Research Scientist specializing in robotic manipulation to develop the next generation of our intelligent robotic systems. You will be...
Work experience placement
Work at office
Worldwide
Flexible hours
Multiply Labs
San Francisco, CA
2 days ago
Applied Research - RL & Agents
$150k - $300k
...looking for people who want to build at the intersection of frontier research and real infrastructure. We recently raised $15mm in funding... ...Prototype in the Field: Rapidly design and deploy agents, evals, and harnesses for real-world tasks to validate solutions....
Remote work
Visa sponsorship
Relocation package
Flexible hours
Prime Intellect
San Francisco, CA
3 days ago
Hourly Research Fellow
$20.86 - $28.71 per hour
...Research Assistant I Under the general supervision of the Principal Investigator, perform specialized and routine procedures and techniques in controlled experiments in the Brain Imaging and EEG Laboratory. Position will aid in an ongoing research projects investigating...
Hourly pay
NCIRE - The Northern California Institute for Research and E...
San Francisco, CA
4 days ago
India Research Analyst
**NO THIRD-PARTY RECRUITERS** ROLE SUMMARY Matthews seeks a high-caliber research analyst to support portfolio management through fundamental company and industry research focused on publicly listed companies in India. The role involves developing actionable investment...
Permanent employment
Local area
Worldwide
Matthews Asia
San Francisco, CA
3 days ago

Do you want to receive more vacancies?

Subscribe and receive similar vacancies to Research, Evals. Be the first to apply!