Research Engineer, AI Capabilities & Evaluations

The Consulting Solutions

The Consulting Solutions is seeking a Research Engineer / Scientist to join the North Stars team. In this role, you will work on enhancing AI-enabled experiences, focusing on improving model capability and performance. You will pursue a comprehensive research agenda while collaborating closely with other teams and building evaluations to track improvements. This position offers a hybrid work model of three days in-office per week and includes relocation assistance for new employees. #J-18808-Ljbffr The Consulting Solutions

Apply

Vacancy posted 2 days ago

Similar jobs that could be interesting for youBased on the Research Engineer, AI Capabilities & Evaluations in San Francisco, CA vacancy

Research Engineer — AGI Evaluation & On-Device Perf
...looking for a skilled professional to build evaluation harnesses that ensure models and agents... ..., and develop tooling to assist research and product teams. The position emphasizes... ...performance metrics to improve AI capabilities. You'll need to have a firm grasp on non...
Suggested
Relocation package
AGI, Inc.
San Francisco, CA
5 days ago
Research Engineer, RSP Evaluations (Autonomy)
$315k
We are looking for Research Engineers to build “gold standard” evaluations for catastrophic risks, in order to understand what AI Safety Level (ASL) to assign to models. Research leads... ...RSP). The policy defines a series of capability thresholds - AI Safety Levels (ASLs)...
Suggested
Currently hiring
Work at office
Immediate start
Home office
Visa sponsorship
Relocation package
Anthropic
San Francisco, CA
5 days ago
Benchmarking Research Engineer: Frontier Model Evaluations
Refresh AI is seeking a Research Engineer in San Francisco to push the boundaries of benchmarking technology. You will build benchmarks that labs use for evaluating coding abilities and computer-use capability. Your role will require expertise in reinforcement learning...
Suggested
Full time
Refresh AI
San Francisco, CA
5 days ago
Research Engineer, AGI Evaluation & On-Device Metrics
AI Chopping Block, Inc. is searching for a dedicated professional to help build the evaluation harness necessary for our advanced AGI models. You will audit existing processes,... ...into actionable strategies and elevate our research standards, leading to impactful AI...
Suggested
AI Chopping Block, Inc.
San Francisco, CA
4 days ago
Research Engineer — AI Evaluation & Release Readiness
...Francisco, is seeking a dedicated professional for a full-time role to evaluate agent models and develop practical assessment rubrics. This... ...to aid decision-making. This role is pivotal to ensure product quality and enhance the research strategy. #J-18808-Ljbffr AGI Inc
Suggested
Full time
Relocation package
AGI Inc
San Francisco, CA
1 day ago
Research Engineer
...training and scaling security AI agents to discover zero-days... ...'re seeking an experienced Research Engineer to join our effort in... ...We are building a technology capable of finding the next Log4J at... ...intuition, experience in model evaluation, and benchmarks. Reinforcement...
Full time
Work at office
DepthFirst
San Francisco, CA
4 days ago
Senior AI Research Engineer - RAG & GenAI Evaluation
Drata is seeking a Senior Applied Research Engineer to enhance the quality of AI systems through rigorous evaluation and experimentation. This role emphasizes applied research, focusing on information retrieval and reasoning strategies. The ideal candidate will bring 5...
jobr.pro
San Francisco, CA
5 days ago
Research Engineer/Scientist - Human Alignment, Consumer Devices
$380k
...The Future of Computing Research team is an applied research... ...methods, models, and evaluation frameworks that support... ...frontier of multimodal AI, helping turn emerging model capabilities into product experiences... ...closely across research, engineering, design, product, and safety...
Work at office
Immediate start
Relocation package
OpenAI
San Francisco, CA
4 days ago
Research Engineer (Universes)
...interpretable, and steerable AI systems. We want AI to be... ...growing group of committed researchers, engineers, policy experts, and business... ...of training environments for capable and safe agentic AI. This role... ...of the art, and building evaluations that measure genuine capability...
Work at office
Remote work
Visa sponsorship
Shift work
Menlo Ventures
San Francisco, CA
1 day ago
Research - engineering
...Analysis is a security research lab focused on adversarial simulations, evaluations, and runtime... ...work across research, engineering, and product. About the... ...models for adversarial capabilities using reinforcement learning... ...build deep context in AI security. You are results...
General Analysis
San Francisco, CA
1 day ago
Research Engineer, Performance RL (Reinforcement Learning)
$350k
...interpretable, and steerable AI systems. We want AI to be... ...growing group of committed researchers, engineers, policy experts, and business... ...on the autonomy and coding capabilities of Claude Sonnet 4.6 and... ...implement RL environments and evaluations. Conduct experiments and...
Work at office
Visa sponsorship
Flexible hours
Menlo Ventures
San Francisco, CA
1 day ago
Research Engineer, Applied Finetuning
$315k
As a Research Engineer or Research Scientist in Applied Finetuning, you will... ...to the public via Claude.AI and our API. In this role, you... ...on data mixes, design evaluations, and improve our production... ...that tests Claude’s reasoning capabilities Collaborate with a research...
Work at office
Home office
Visa sponsorship
Relocation package
Anthropic
San Francisco, CA
3 days ago
Research Engineer, Benchmarking
# Research Engineer, BenchmarkingEngineeringSan FranciscoFull-timeBuild the... ...coding and computer-use capability. Translate expert workflows into rigorous, verifiable evaluations, run them against frontier models... ...fine-tuning at a high level. #J-18808-Ljbffr Refresh AI
Refresh AI
San Francisco, CA
5 days ago
Research Engineer
...currently on-site) Industry: AI infrastructure /... ...Learning (RL) training data & evaluations Compensation: Competitive (range... ...Opportunity Our partner is hiring a Research Engineer to help scale the quality... ...with modern AI tooling and LLM capabilities Equal Opportunity &...
Remote work
talentpluto
San Francisco, CA
1 day ago
Research Engineer
...Archive Human Archive is a research lab backed by Y... ...function gains in model capability. The deployment of capable... ...As a Research Engineer, you’ll work on multimodal... ...research for embodied AI and robotics. This role... ...design experiments, evaluate new sensing stacks, and...
Shift work
Human Archive
San Francisco, CA
4 days ago
Senior Research Engineer
$300k - $400k
...leading conversational AI platform empowering... .... About the Team The Research team develops the model... ...prompting, orchestration, and evaluation in order to make our... ...As a Senior Research Engineer, you’ll be responsible... ...agent’s reliability, capability, and efficiency...
Work at office
Decagon
San Francisco, CA
2 days ago
Research Engineer, Codex
...building state-of-the-art AI systems that can write code... ...reasoning, and deploy these capabilities in real-world products such... ...coding. We operate across research, engineering, product, and infrastructure... ...model training, alignment, and evaluation. Hunt down and address...
Work at office
Relocation package
OpenAI
San Francisco, CA
1 day ago
Research Engineering Tech.
At Capably, we’re building technology that helps businesses operate... ...seamless automation. As a Research Engineer at Capably, you’ll help... ...developing the models, systems, and evaluation approaches that make agentic... ...what today’s enterprise AI tools can reliably deliver....
Capably
San Francisco, CA
5 days ago
Research - engineering
...the Team The Privacy Engineering Team at OpenAI is committed... ...engineering and research partners with the necessary... ...and efficiency of our AI systems. You will help... ...internal libraries, evaluation suites, and... ...pushing the frontiers of capability. About OpenAI OpenAI...
Relocation package
OpenAI
San Francisco, CA
4 days ago
Research - engineering
$190k - $320k
Research Engineer - Computer Vision & Machine Learning Want to build vision... .... Vision is a core capability. Your work will directly influence... ...architectures, training pipelines, evaluation frameworks, and inference... ...vision systems that connect AI to the physical world in...
Trades Workforce Solutions
San Francisco, CA
5 days ago
Research Engineer / Research Scientist -Personal AGI, Proactivity
$295k
Research Engineer / Research Scientist -Personal AGI, Proactivity Post-training... ...technical foundations for AI that can anticipate what... ...personalization and agentic capabilities. Our team works on reinforcement... ...learning, dataset creation, evaluations, and other post-training...
Work at office
Relocation package
Shift work
OpenAI
San Francisco, CA
1 day ago
Applied Research Engineer, Agents
The Role As an Applied Research Engineer , you will serve as the crucial link... ...in enabling agentic capabilities across the Hebbia product suite... ...learning systems , and LLM evaluation ; experience building with foundation... ...products. Frequent user of AI products, especially during...
Gravity Engineering Services Pvt Ltd.
San Francisco, CA
2 days ago
Research Engineer, RL Infrastructure (Knowledge Work)
$350k
...interpretable, and steerable AI systems. We want AI to be... ...growing group of committed researchers, engineers, policy experts, and business... ...training environments and evaluations that make Claude effective... ...processes for Knowledge Work capabilities, including the process used...
Visa sponsorship
Shift work
United States Digital Space LLC
San Francisco, CA
20 hours ago
Research Engineer / Scientist, Societal Impacts
$350k
...reliable, interpretable, and steerable AI systems. We want AI to be safe and... ...quickly growing group of committed researchers, engineers, policy experts, and business... ...values do our systems have?), and evaluating novel AI capabilities as they arise. We develop privacy-preserving...
Full time
Contract work
For contractors
For subcontractor
Work at office
Visa sponsorship
Flexible hours
Menlo Ventures
San Francisco, CA
2 days ago
Principal Research Engineer - Code
...Turing is the world’s leading research accelerator for frontier AI labs and a trusted... ...create RL environments to evaluate and improve our customers... ...vary depending on the model capability being evaluated /... ...Environments for Software Engineering / coding agents UI-Environments...
For contractors
Flexible hours
Cerebras
San Francisco, CA
1 day ago
Applied Research Engineer (Agents)
$160k - $300k
Hebbia is the AI platform for investors and bankers... ...and retrieval capabilities - unlocking meaningful... ...and deep, multi-source research. We’ve built our own agentic... ...LLM inference engine - a distributed, asynchronous... ...systems, and LLM evaluation; experience building with...
Contract work
For contractors
For subcontractor
Work at office
Hebbia
San Francisco, CA
4 days ago
Research Engineer/Research Scientist - Personal AGI, North Stars
About the Role You’ll work as a Research Engineer / Scientist on the North... ...bring the next generation of AI‑enabled experiences to all of humanity by closing the capability overhang between power users... ...these insights into robust evaluations, training data, reward signals...
Work at office
Relocation package
The Consulting Solutions
San Francisco, CA
2 days ago
Research Engineer, Infrastructure
...Are We are an applied AI lab building end-to-end... ...the first AI software engineer, and Windsurf, an AI-... ...former founders, and researchers from the frontier of AI... .... Every training run, evaluation loop, and experimental... ...more about demonstrated capability than credentials. A...
Cognition
San Francisco, CA
3 days ago
Research Engineer / Scientist, Alignment Science
$280k
...interpretable, and steerable AI systems. We want AI to be... ...growing group of committed researchers, engineers, policy experts, and business... ...the context of human-level capabilities. You could describe... ...Build tooling to efficiently evaluate the effectiveness of novel...
Contract work
For contractors
For subcontractor
Work at office
Relocation
Visa sponsorship
Work visa
Flexible hours
Menlo Ventures
San Francisco, CA
4 days ago
Research Engineer, Production Model Post-Training
$315k
...interpretable, and steerable AI systems. We want AI to be... ...growing group of committed researchers, engineers, policy experts, and business... ...processes to enhance their capabilities, alignment, and safety. As... ...for model fine-tuning and evaluation Develop tools to measure and...
Work at office
Visa sponsorship
Flexible hours
Menlo Ventures
San Francisco, CA
2 days ago

Do you want to receive more vacancies?

Subscribe and receive similar vacancies to Research Engineer, AI Capabilities & Evaluations. Be the first to apply!