Sign up to access all features of our service.
  • Job search
  • Favorites
  • Create a CV
    New
  • Salaries
  • Subscriptions

Research, Evals

Exa Labs

Exa ML Evaluations Engineer

Exa is an applied AI lab building a search engine unlike the world has ever seen. We build massive-scale infra to crawl the entire web, train state-of-the-art embedding models to process it, and design super high performant vector databases to retrieve over it. We now power search for Cursor, Cognition, HubSpot, and over 400,000 developers and have raised $350m from Lightspeed, Benchmark, and a16z.

Our ultimate goal is to build perfect search over all the world's information, far beyond Google. If you want to build massive-scale ML systems that will define the way the new AI world consumes information, this is the place for you.

Research at Exa

The ML organization sits at the heart of our mission. We train foundational models for search. Our goal is to build systems that can instantly filter the world's knowledge to exactly what you want, no matter how complex your query. Basically, put the web into an extremely powerful database.

And to do that well, we need to measure what "good search" actually means. That's where you come in.

We're looking for an ML evals engineer to design and build our eval stack at Exa. The role involves investigating how to evaluate search engines in an LLM world and then building the most comprehensive, creative, and effective eval suite. You will be deciding the future of search through the evals we choose to optimize for - your work will directly influence what the research team works on and shape the direction of the company.

Who You Are
  • Have hands-on ML experience (training, finetuning, or evaluating models (bonus if related to embeddings or LLMs)

  • Have strong engineering fundamentals and can build reliable systems (Python, Rust, distributed pipelines, GPU/cluster jobs, etc.)

  • Enjoy diving into data via building eval sets, inspecting edge cases, designing creative measurement strategies

What You Could Do
  • Write a manifesto of what perfect search means

  • Design and implement evaluation frameworks that probe the limits of search

  • Build scalable, reliable eval pipelines that track regressions, drift, and quality signals across billions of documents

  • Create golden datasets, synthetic benchmarks, agentic tasks, and real-world test suites that reflect how developers, agents, and humans actually use Exa

  • Partner closely with ML researchers, data engineers, infra engineers, and product to shape the feedback loops that improve our search models

Logistics
  • Location: This is an in-person opportunity in San Francisco.

  • Visas: We're happy to sponsor international candidates (e.g., STEM OPT, OPT, H1B, O1, E3). While we cannot guarantee your visa, we have historically been successful in sponsoring candidates from all over the world. If you receive an offer, our team will work hard to get you a visa.

  • Benefits: We offer premium healthcare benefits (medical, dental, vision), fertility benefits, 16 weeks of fully paid parental leave for all new parents, and a monthly wellness stipend to all of our employees.

Vacancy posted 4 days ago
Similar jobs that could be interesting for youBased on the Research, Evals in San Francisco, CA vacancy
  • $240k - $380k

     ...Staff Machine Learning Research Scientist, LLM Evals Ready to Apply? Join the team shaping the future of AI at Scale. As the leading data and evaluation partner for frontier AI companies, Scale is dedicated to advancing the evaluation and benchmarking of large language... 
    Suggested
    Full time

    Scale AI

    San Francisco, CA
    4 days ago
  • $210k - $385k

     ...cover our use cases. In this role, you will build specialized evals to improve answer quality across Perplexity, covering search-based...  ...or consumer apps, with real user traffic at scale A strong research background, with experience applying research methods to real-world... 
    Suggested
    Full time
    Local area

    Pantera Capital

    San Francisco, CA
    4 days ago
  • $150k - $300k

     ...looking for people who want to build at the intersection of frontier research and real infrastructure. We recently raised $15mm in funding...  ...in the Field: Rapidly designing and deploying agents, evals, and harnesses alongside customers to validate solutions. Using... 
    Suggested
    Remote work
    Visa sponsorship
    Relocation package
    Flexible hours

    Prime Intellect

    San Francisco, CA
    4 days ago
  •  ...individuals, agents, enterprises, and even nation states. Our team of AI researchers and company builders come from DeepMind, OpenAI, Google Brain,...  .... This is a foundational role. Reflection is building model evals and safety from the ground up, and this RPM will be at the... 
    Suggested
    Relocation package

    Reflection AI, Inc

    San Francisco, CA
    3 days ago
  •  .... Why AGI, Inc. We’re a stealth team of elite founders and AI researchers, with backgrounds spanning Stanford, OpenAI, and DeepMind . We’...  ...behind one question: did this actually get better? Without a strong evals function, the lab ships vibes. With one, every training run,... 
    Suggested
    Relocation package

    AGI, Inc.

    San Francisco, CA
    16 hours ago
  •  ...ambitious, fast-paced and deeply committed team. You’ll work alongside researchers, operators, and AI companies at the forefront of shaping the...  ...behavior, and real-world reasoning. You’ll design and run evals, build rubrics and scorers, and turn failure analysis into actionable... 
    Work at office

    Mercor

    San Francisco, CA
    2 days ago
  • $160k - $240k

    Research Engineer — Evals Location: San Francisco, CA (Hybrid) OR Remote (Americas, UTC-3 to UTC-10) Employment Type: Full time Department: Engineering Team Compensation: $160K - $240K • 0.01% - 0.10% Overview You'll build the evaluation systems that tell us whether Firecrawl... 
    Full time
    Temporary work
    Work at office
    Remote work

    AI Chopping Block, Inc.

    San Francisco, CA
    2 days ago
  •  ...problems where the edge cases matter most. We’re looking for a Research Engineer to help define how we measure and improve model...  ...identity, and risk workflows Design and run offline and online evals that measure model performance on real customer tasks, not just... 

    Variance

    San Francisco, CA
    2 days ago
  • $175k

     ...Research Product Manager San Francisco Thinking Machines Lab's mission is to empower humanity through advancing collaborative general...  ...or frontier models, with contributions to areas like evals, multimodality, human-ai interaction, post-training, pre-training... 
    Local area
    Immediate start
    Visa sponsorship
    Work visa
    Relocation package

    Thinking Machines Lab

    San Francisco, CA
    2 days ago
  •  ...Founding Research Scientist, Human Simulation TL;DR: Listen is building the human layer of AI: a preference model trained on millions...  ...preferences shift. You make research real. You can train models, write evals, and collaborate with our research engineers to put the model... 
    Flexible hours
    Shift work

    AI Chopping Block, Inc.

    San Francisco, CA
    1 day ago
  •  ...Will Do Define the personalization tech strategy: Set the research and modeling agenda for how Oura represents users, retrieves...  ...inform the next decision — you will build lightweight offline evals and shadow-mode testing infrastructure that let the team iterate... 
    Temporary work
    Work at office
    Flexible hours
    2 days per week
    3 days per week

    Oura

    San Francisco, CA
    4 days ago
  • $132.5k - $162k

     ...emerging markets. Look into new datasets, techniques and insights, and use these to create new alpha sources, cooperating with expert researchers and quantitative portfolio managers. Conduct long term projects in independent research of a quality publishable in practitioner... 
    Apprenticeship
    Work at office
    Local area
    Work from home
    Flexible hours
    1 day per week

    BlackRock Services

    San Francisco, CA
    6 days ago
  • $275k - $370k

     ...We want AI to be safe and beneficial for our users and for society as a whole. Our team is a quickly growing group of committed researchers, engineers, policy experts, and business leaders working together to build beneficial AI systems. About the Role As an Economist... 
    Work at office
    Visa sponsorship
    Flexible hours

    Menlo Ventures

    San Francisco, CA
    1 day ago
  • $350k

     ...Research, Post-Training Thinking Machines Lab's mission is to empower humanity through advancing collaborative general intelligence...  ...Measure how recipe choices affect various metrics. Iterate on evals: post-training involves a never-ending loop of defining a set... 
    Local area
    Immediate start
    Visa sponsorship
    Work visa
    Relocation package

    Thinking Machines Lab

    San Francisco, CA
    3 days ago
  •  ...OpenEvidence, Abridge, Clay, Gamma and Writer. By uniting applied AI research, flexible infrastructure, and seamless developer tooling, we...  ...data generation. Build task‑specific training environments and evals tailored to customer domains like healthcare, code generation,... 
    Flexible hours
    Shift work

    BaseTen

    San Francisco, CA
    2 days ago
  • $150k - $300k

     ...looking for people who want to build at the intersection of frontier research and real infrastructure. We recently raised $15mm in funding...  ...quality Stay current on the frontier of agentic AI, evals, and post-training methods, and bring that knowledge directly into... 
    Remote work
    Visa sponsorship
    Relocation package
    Flexible hours

    Prime Intellect

    San Francisco, CA
    4 days ago
  • $225k - $400k

     ...Founding Research Scientist ABOUT THE ROLE This is a research-driven, high-impact role for ML researchers who want to push the boundaries of real-time AI. As a Founding Machine Learning Researcher, you’ll focus on advancing model capabilities for human-like voice... 
    H1b
    Relocation
    Visa sponsorship

    kadence

    San Francisco, CA
    4 days ago
  • $120k - $200k

     ...ambitious, fast-paced and deeply committed team. You'll work alongside researchers, operators, and AI companies at the forefront of shaping the...  ...community pages (e.g., Hugging Face) Run and monitor new evals Support with marketing for benchmarks and evals, including... 
    Work at office
    Relocation package

    Mercor Alabaster

    San Francisco, CA
    4 days ago
  • $245k - $285k

     ...society as a whole. Our team is a quickly growing group of committed researchers, engineers, policy experts, and business leaders working...  ...this role, you will: Design and execute capability evaluations ("evals") to assess the capabilities of new models Collaborate closely... 
    Full time
    Work at office
    Visa sponsorship
    Flexible hours
    Shift work

    Anthropic

    San Francisco, CA
    3 days ago
  • $365k

     ...society as a whole. Our team is a quickly growing group of committed researchers, engineers, policy experts, and business leaders working...  ...teams need most. You'll move across research areas like compute, evals, RL environments, and emerging research initiatives, going deep... 
    Work at office
    Visa sponsorship
    Flexible hours
    Shift work

    Anthropic

    San Francisco, CA
    4 days ago
  • $160k - $250k

    About Granica Granica is an AI research and infrastructure company focused on reliable, steerable representations for enterprise data....  ...or direct contributions to AI research (e.g., modeling, data, evals, systems, or related areas). Experience supporting research in... 
    Work at office
    Relocation package
    Flexible hours

    Granica

    San Francisco, CA
    4 days ago
  • $350k

     ...and Segment Anything. About the Role The role of post‑training researchers sits at the core of our roadmap. This is the critical bridge...  ...improve reasoning, truthfulness, and helpfulness. Iterate on evals: post‑training involves a never‑ending loop of defining a set of... 
    Local area
    Visa sponsorship
    Work visa
    Relocation package

    Thinking Machines

    San Francisco, CA
    16 hours ago
  • $171k - $235k

     ...small, flat, highly collaborative team of experienced PMs, AI researchers, engineers, designers, and marketers. This is an opportunity to...  ...Productionizing 1P models for specific use-cases. MLOps infrastructure (evals framework, inference infra, training infra, data pipelines).... 
    Temporary work
    Work at office
    Remote work
    Flexible hours

    Descript

    San Francisco, CA
    4 days ago
  • $146k - $280k

    Applied Data Scientist – Senior Technical Role We are looking for a highly experienced Applied Data Scientist to shape evaluation methodologies for autonomous driving technology. This senior technical role sits at the intersection of Evaluation, Systems & Safety, and the...
    Full time
    Work at office
    Work from home
    Flexible hours

    Waabi

    San Francisco, CA
    1 day ago
  •  ...A leading AI evaluation company is looking for a Staff Machine Learning Research Scientist to advance LLM evaluation methodologies. This role involves designing benchmarks, collaborating with teams, and mentoring others. Ideal candidates have significant experience in... 

    Scale AI

    San Francisco, CA
    1 day ago
  • $310k - $380k

     ...About the team The Frontier Evals team builds north star model evaluations to drive progress towards safe AGI/ASI. This team builds...  ...the team for you. About you We are seeking exceptional research engineers that can push the boundaries of our frontier models in... 
    Work at office
    Local area
    Relocation package
    Flexible hours

    OpenAI

    San Francisco, CA
    more than 2 months ago
  • $160k - $210k

     ...Multiply Labs, we are applying AI to solve complex manipulation challenges in robotic cell therapy manufacturing. We\'re looking for a Research Scientist specializing in robotic manipulation to develop the next generation of our intelligent robotic systems. You will be... 
    Work experience placement
    Work at office
    Worldwide
    Flexible hours

    Multiply Labs

    San Francisco, CA
    2 days ago
  • $150k - $300k

     ...looking for people who want to build at the intersection of frontier research and real infrastructure. We recently raised $15mm in funding...  ...Prototype in the Field: Rapidly design and deploy agents, evals, and harnesses for real-world tasks to validate solutions.... 
    Remote work
    Visa sponsorship
    Relocation package
    Flexible hours

    Prime Intellect

    San Francisco, CA
    3 days ago
  • $20.86 - $28.71 per hour

     ...Research Assistant I Under the general supervision of the Principal Investigator, perform specialized and routine procedures and techniques in controlled experiments in the Brain Imaging and EEG Laboratory. Position will aid in an ongoing research projects investigating... 
    Hourly pay

    NCIRE - The Northern California Institute for Research and E...

    San Francisco, CA
    4 days ago
  • **NO THIRD-PARTY RECRUITERS** ROLE SUMMARY Matthews seeks a high-caliber research analyst to support portfolio management through fundamental company and industry research focused on publicly listed companies in India. The role involves developing actionable investment... 
    Permanent employment
    Local area
    Worldwide

    Matthews Asia

    San Francisco, CA
    3 days ago

Do you want to receive more vacancies?

Subscribe and receive similar vacancies to Research, Evals. Be the first to apply!