Research, Evals
Exa Labs
Exa ML Evaluations Engineer
Exa is an applied AI lab building a search engine unlike the world has ever seen. We build massive-scale infra to crawl the entire web, train state-of-the-art embedding models to process it, and design super high performant vector databases to retrieve over it. We now power search for Cursor, Cognition, HubSpot, and over 400,000 developers and have raised $350m from Lightspeed, Benchmark, and a16z.
Our ultimate goal is to build perfect search over all the world's information, far beyond Google. If you want to build massive-scale ML systems that will define the way the new AI world consumes information, this is the place for you.
Research at Exa
The ML organization sits at the heart of our mission. We train foundational models for search. Our goal is to build systems that can instantly filter the world's knowledge to exactly what you want, no matter how complex your query. Basically, put the web into an extremely powerful database.
And to do that well, we need to measure what "good search" actually means. That's where you come in.
We're looking for an ML evals engineer to design and build our eval stack at Exa. The role involves investigating how to evaluate search engines in an LLM world and then building the most comprehensive, creative, and effective eval suite. You will be deciding the future of search through the evals we choose to optimize for - your work will directly influence what the research team works on and shape the direction of the company.
Who You Are
Have hands-on ML experience (training, finetuning, or evaluating models (bonus if related to embeddings or LLMs)
Have strong engineering fundamentals and can build reliable systems (Python, Rust, distributed pipelines, GPU/cluster jobs, etc.)
Enjoy diving into data via building eval sets, inspecting edge cases, designing creative measurement strategies
What You Could Do
Write a manifesto of what perfect search means
Design and implement evaluation frameworks that probe the limits of search
Build scalable, reliable eval pipelines that track regressions, drift, and quality signals across billions of documents
Create golden datasets, synthetic benchmarks, agentic tasks, and real-world test suites that reflect how developers, agents, and humans actually use Exa
Partner closely with ML researchers, data engineers, infra engineers, and product to shape the feedback loops that improve our search models
Logistics
Location: This is an in-person opportunity in San Francisco.
Visas: We're happy to sponsor international candidates (e.g., STEM OPT, OPT, H1B, O1, E3). While we cannot guarantee your visa, we have historically been successful in sponsoring candidates from all over the world. If you receive an offer, our team will work hard to get you a visa.
Benefits: We offer premium healthcare benefits (medical, dental, vision), fertility benefits, 16 weeks of fully paid parental leave for all new parents, and a monthly wellness stipend to all of our employees.
$240k - $380k
...Staff Machine Learning Research Scientist, LLM Evals Ready to Apply? Join the team shaping the future of AI at Scale. As the leading data and evaluation partner for frontier AI companies, Scale is dedicated to advancing the evaluation and benchmarking of large language...SuggestedFull time$210k - $385k
...cover our use cases. In this role, you will build specialized evals to improve answer quality across Perplexity, covering search-based... ...or consumer apps, with real user traffic at scale A strong research background, with experience applying research methods to real-world...SuggestedFull timeLocal area$150k - $300k
...looking for people who want to build at the intersection of frontier research and real infrastructure. We recently raised $15mm in funding... ...in the Field: Rapidly designing and deploying agents, evals, and harnesses alongside customers to validate solutions. Using...SuggestedRemote workVisa sponsorshipRelocation packageFlexible hours- ...individuals, agents, enterprises, and even nation states. Our team of AI researchers and company builders come from DeepMind, OpenAI, Google Brain,... .... This is a foundational role. Reflection is building model evals and safety from the ground up, and this RPM will be at the...SuggestedRelocation package
- .... Why AGI, Inc. We’re a stealth team of elite founders and AI researchers, with backgrounds spanning Stanford, OpenAI, and DeepMind . We’... ...behind one question: did this actually get better? Without a strong evals function, the lab ships vibes. With one, every training run,...SuggestedRelocation package
- ...ambitious, fast-paced and deeply committed team. You’ll work alongside researchers, operators, and AI companies at the forefront of shaping the... ...behavior, and real-world reasoning. You’ll design and run evals, build rubrics and scorers, and turn failure analysis into actionable...Work at office
$160k - $240k
Research Engineer — Evals Location: San Francisco, CA (Hybrid) OR Remote (Americas, UTC-3 to UTC-10) Employment Type: Full time Department: Engineering Team Compensation: $160K - $240K • 0.01% - 0.10% Overview You'll build the evaluation systems that tell us whether Firecrawl...Full timeTemporary workWork at officeRemote work- ...problems where the edge cases matter most. We’re looking for a Research Engineer to help define how we measure and improve model... ...identity, and risk workflows Design and run offline and online evals that measure model performance on real customer tasks, not just...
$175k
...Research Product Manager San Francisco Thinking Machines Lab's mission is to empower humanity through advancing collaborative general... ...or frontier models, with contributions to areas like evals, multimodality, human-ai interaction, post-training, pre-training...Local areaImmediate startVisa sponsorshipWork visaRelocation package- ...Founding Research Scientist, Human Simulation TL;DR: Listen is building the human layer of AI: a preference model trained on millions... ...preferences shift. You make research real. You can train models, write evals, and collaborate with our research engineers to put the model...Flexible hoursShift work
- ...Will Do Define the personalization tech strategy: Set the research and modeling agenda for how Oura represents users, retrieves... ...inform the next decision — you will build lightweight offline evals and shadow-mode testing infrastructure that let the team iterate...Temporary workWork at officeFlexible hours2 days per week3 days per week
$132.5k - $162k
...emerging markets. Look into new datasets, techniques and insights, and use these to create new alpha sources, cooperating with expert researchers and quantitative portfolio managers. Conduct long term projects in independent research of a quality publishable in practitioner...ApprenticeshipWork at officeLocal areaWork from homeFlexible hours1 day per week$275k - $370k
...We want AI to be safe and beneficial for our users and for society as a whole. Our team is a quickly growing group of committed researchers, engineers, policy experts, and business leaders working together to build beneficial AI systems. About the Role As an Economist...Work at officeVisa sponsorshipFlexible hours$350k
...Research, Post-Training Thinking Machines Lab's mission is to empower humanity through advancing collaborative general intelligence... ...Measure how recipe choices affect various metrics. Iterate on evals: post-training involves a never-ending loop of defining a set...Local areaImmediate startVisa sponsorshipWork visaRelocation package- ...OpenEvidence, Abridge, Clay, Gamma and Writer. By uniting applied AI research, flexible infrastructure, and seamless developer tooling, we... ...data generation. Build task‑specific training environments and evals tailored to customer domains like healthcare, code generation,...Flexible hoursShift work
$150k - $300k
...looking for people who want to build at the intersection of frontier research and real infrastructure. We recently raised $15mm in funding... ...quality Stay current on the frontier of agentic AI, evals, and post-training methods, and bring that knowledge directly into...Remote workVisa sponsorshipRelocation packageFlexible hours$225k - $400k
...Founding Research Scientist ABOUT THE ROLE This is a research-driven, high-impact role for ML researchers who want to push the boundaries of real-time AI. As a Founding Machine Learning Researcher, you’ll focus on advancing model capabilities for human-like voice...H1bRelocationVisa sponsorship$120k - $200k
...ambitious, fast-paced and deeply committed team. You'll work alongside researchers, operators, and AI companies at the forefront of shaping the... ...community pages (e.g., Hugging Face) Run and monitor new evals Support with marketing for benchmarks and evals, including...Work at officeRelocation package$245k - $285k
...society as a whole. Our team is a quickly growing group of committed researchers, engineers, policy experts, and business leaders working... ...this role, you will: Design and execute capability evaluations ("evals") to assess the capabilities of new models Collaborate closely...Full timeWork at officeVisa sponsorshipFlexible hoursShift work$365k
...society as a whole. Our team is a quickly growing group of committed researchers, engineers, policy experts, and business leaders working... ...teams need most. You'll move across research areas like compute, evals, RL environments, and emerging research initiatives, going deep...Work at officeVisa sponsorshipFlexible hoursShift work$160k - $250k
About Granica Granica is an AI research and infrastructure company focused on reliable, steerable representations for enterprise data.... ...or direct contributions to AI research (e.g., modeling, data, evals, systems, or related areas). Experience supporting research in...Work at officeRelocation packageFlexible hours$350k
...and Segment Anything. About the Role The role of post‑training researchers sits at the core of our roadmap. This is the critical bridge... ...improve reasoning, truthfulness, and helpfulness. Iterate on evals: post‑training involves a never‑ending loop of defining a set of...Local areaVisa sponsorshipWork visaRelocation package$171k - $235k
...small, flat, highly collaborative team of experienced PMs, AI researchers, engineers, designers, and marketers. This is an opportunity to... ...Productionizing 1P models for specific use-cases. MLOps infrastructure (evals framework, inference infra, training infra, data pipelines)....Temporary workWork at officeRemote workFlexible hours$146k - $280k
Applied Data Scientist – Senior Technical Role We are looking for a highly experienced Applied Data Scientist to shape evaluation methodologies for autonomous driving technology. This senior technical role sits at the intersection of Evaluation, Systems & Safety, and the...Full timeWork at officeWork from homeFlexible hours- ...A leading AI evaluation company is looking for a Staff Machine Learning Research Scientist to advance LLM evaluation methodologies. This role involves designing benchmarks, collaborating with teams, and mentoring others. Ideal candidates have significant experience in...
$310k - $380k
...About the team The Frontier Evals team builds north star model evaluations to drive progress towards safe AGI/ASI. This team builds... ...the team for you. About you We are seeking exceptional research engineers that can push the boundaries of our frontier models in...Work at officeLocal areaRelocation packageFlexible hours$160k - $210k
...Multiply Labs, we are applying AI to solve complex manipulation challenges in robotic cell therapy manufacturing. We\'re looking for a Research Scientist specializing in robotic manipulation to develop the next generation of our intelligent robotic systems. You will be...Work experience placementWork at officeWorldwideFlexible hours$150k - $300k
...looking for people who want to build at the intersection of frontier research and real infrastructure. We recently raised $15mm in funding... ...Prototype in the Field: Rapidly design and deploy agents, evals, and harnesses for real-world tasks to validate solutions....Remote workVisa sponsorshipRelocation packageFlexible hours$20.86 - $28.71 per hour
...Research Assistant I Under the general supervision of the Principal Investigator, perform specialized and routine procedures and techniques in controlled experiments in the Brain Imaging and EEG Laboratory. Position will aid in an ongoing research projects investigating...Hourly pay- **NO THIRD-PARTY RECRUITERS** ROLE SUMMARY Matthews seeks a high-caliber research analyst to support portfolio management through fundamental company and industry research focused on publicly listed companies in India. The role involves developing actionable investment...Permanent employmentLocal areaWorldwide
Do you want to receive more vacancies?
Subscribe and receive similar vacancies to Research, Evals. Be the first to apply!
- research dietitian San Francisco, CA
- history research San Francisco, CA
- education policy research San Francisco, CA
- research pharmacist San Francisco, CA
- research professional San Francisco, CA
- student research intern San Francisco, CA
- research intern San Francisco, CA
- physics research San Francisco, CA
- pharmaceutical research San Francisco, CA
- cancer research San Francisco, CA


