Research Program Manager - Model Evals and Safety

Reflection AI, Inc

Our Mission

Reflection's mission is to build open superintelligence and make it accessible to all .

We're developing open weight models for individuals, agents, enterprises, and even nation states. Our team of AI researchers and company builders come from DeepMind, OpenAI, Google Brain, Meta, Character.AI, Anthropic and beyond.

About the Role

Research Program Managers at Reflection are high-leverage leaders and operators who embed directly with research and infrastructure teams to accelerate the pace of frontier model development. They are not project trackers. They are force multipliers who bring clarity to ambiguity, drive decisions when the path forward is unclear, and ensure that the work happening across multiple teams connects into a coherent whole.

This is a foundational role. Reflection is building model evals and safety from the ground up, and this RPM will be at the center of that effort. You won't be stepping into an established function with existing processes and tooling. You will be the person who figures out what this function needs to look like, stands it up, and makes it real. That means defining the evaluation frameworks, building the operational infrastructure for model safety, establishing the processes that connect evals to the model development lifecycle, and laying the groundwork for how Reflection interfaces with the broader safety ecosystem. This is 0-to-1 work in its purest form.

You bring a first-responder mentality. When things go sideways, you don't wait to be asked. You jump in, assess the situation, cut through noise, align the people who need to be aligned, and drive resolution.

What You'll Do

Build the foundational infrastructure for model evals and safety at Reflection. Define the evaluation frameworks, tooling requirements, and operational processes that will underpin how we assess model capabilities, risks, and readiness for release.
Stand up model safety operations as a function, including establishing the workflows, review cadences, and decision frameworks that connect safety evaluation to the model development and release lifecycle.
Partner with research and engineering leads across pre-training, mid-training, and post-training to embed safety and evaluation checkpoints into the development process in a way that is rigorous without being a bottleneck.
Drive the scoping and prioritization of eval science and eval infrastructure investments, working with technical leads to determine what to build in-house, what to adopt, and where to invest research effort.
Establish Reflection's engagement with the external safety ecosystem, including third-party assessments, academic partnerships, and industry safety frameworks. Represent the company's safety posture to external stakeholders with credibility and clarity.
Create visibility and reporting structures that give leadership a clear, honest picture of model safety status, evaluation coverage, and open risks, so they can make informed decisions at the pace the business requires.
Champion a culture of blameless post-mortems and continuous learning, turning every safety-relevant finding into a concrete improvement to our systems and processes.

About You

7+ years of experience in technical program management, research operations, or ML engineering, with demonstrated experience standing up new functions, teams, or programs from scratch.
Familiar with the landscape of model evaluation and AI safety, including evaluation methodologies, red-teaming, alignment research, and the evolving regulatory and industry safety ecosystem. You don't need to be a safety researcher, but you need to understand the space well enough to make sound judgments about what matters and what to prioritize.
Deep enough technically to engage with researchers and engineers on topics like model behavior, evaluation design, data pipelines, and safety-critical system architecture. You follow the technical thread and you know when something doesn't add up.
Proven ability to build structures where none exists. You've taken ambiguous mandates and turned them into functioning programs with clear ownership, measurable outcomes, and durable processes.
Strong stakeholder management skills spanning deeply technical ICs, research leadership, and external partners. You build trust through competence and follow-through.
Excited to build from zero to one. We are a small, fast-moving team and this role will help define how model safety and evaluation works at Reflection.
Motivated by enabling researchers and engineers to build the world's most capable open-weight AI systems, responsibly.

What We Offer:

We believe that to build superintelligence that is truly open, you need to start at the foundation. Joining Reflection means building from the ground up as part of a small talent-dense team. You will help define our future as a company, and help define the frontier of open foundational models.

We want you to do the most impactful work of your career with the confidence that you and the people you care about most are supported.

Top-tier compensation: Salary and equity structured to recognize and retain the best talent globally.
Health & wellness: Comprehensive medical, dental, vision, life, and disability insurance.
Life & family: Fully paid parental leave for all new parents, including adoptive and surrogate journeys. Financial support for family planning.
Benefits & balance: paid time off when you need it, relocation support, and more perks that optimize your time.
Opportunities to connect with teammates: lunch and dinner are provided daily. We have regular off-sites and team celebrations.

Apply

Vacancy posted 2 days ago

Similar jobs that could be interesting for youBased on the Research Program Manager - Model Evals and Safety in San Francisco, CA vacancy

Research Program Manager - Model Evals and Safety
...Research Program Manager Reflection's mission is to build open superintelligence and make it... ...all. We're developing open weight models for individuals, agents, enterprises... ...role. Reflection is building model evals and safety from the ground up, and this RPM will...
Suggested
Relocation package
Reflection AI
San Francisco, CA
2 days ago
Technical Program Manager - Adversarial Model Research
$207k - $285k
...surfacing vulnerabilities, and collaborating closely with researchers to strengthen model reliability and public trust. About the Role As a Technical Program Manager, you will lead initiatives that test the safety and robustness of OpenAI's models through creative...
Suggested
Work at office
Relocation package
OpenAI
San Francisco, CA
2 days ago
Research Program Manager
$120k - $200k
...network trains frontier AI models in the same way teachers teach... ...team. You'll work alongside researchers, operators, and AI companies... ...AI models. As a Research Program Manager, you will play a central role... ...Face) Run and monitor new evals Support with marketing for...
Suggested
Work at office
Relocation package
Mercor Alabaster
San Francisco, CA
2 days ago
Research Program Manager:
$65 - $75 per hour
Primary Skills: Program Management (Intermediate), Cross-functional Collaboration (Proficient), Detailed Tracking (Advanced), Stakeholder Communication... ...Manager to enhance coordination and consistency within a research and strategy team. You will manage cross-functional...
Suggested
Hourly pay
Contract work
Akraya
San Francisco, CA
4 days ago
Research Program Manager
$95 - $105 per hour
...Our client, a leading technology organization specializing in global communication platforms, is seeking a Research Program Manager IV to join their team. As a Research Program Manager IV, you will be part of the Data & Operations department supporting cross‑functional...
Suggested
Hourly pay
Weekly pay
Temporary work
Remote work
Flexible hours
ManpowerGroup Global, Inc.
Daly City, CA
1 day ago
Research Program Manager
$67.61 - $84.51 per hour
...Description Research Program Manager Full-time San Francisco, CA, US You'll be joining Adobe on a contract opportunity, employed through NextDeavor Benefits You'll Love NextDeavor offers health, vision and dental benefits for contract employees...
Hourly pay
Permanent employment
Full time
Contract work
NextDeavor
San Francisco, CA
1 day ago
Technical Program Manager, Research
$365k
...growing group of committed researchers, engineers, policy... ...works across the full model development lifecycle,... ...interpretability, and safety, each operating at the... .... As a Technical Program Manager for Research, you'll define... ...areas like compute, evals, RL environments, and...
Work at office
Visa sponsorship
Flexible hours
Shift work
Anthropic
San Francisco, CA
2 days ago
Research Program Manager - Research Infrastructure
...to all. We're developing open weight models for individuals, agents, enterprises, and even nation states. Our team of AI researchers and company builders come from DeepMind... ...beyond. About the Role Research Program Managers at Reflection are high-leverage leaders...
Relocation package
Reflection AI
San Francisco, CA
2 days ago
Remote Research Program Manager - Data & Ops Lead
...A leading technology organization is seeking a Research Program Manager IV to support cross-functional teams. This remote role requires at least 8 years of program management experience and 3 years in analytics. Key responsibilities include managing program operations...
Remote work
ManpowerGroup Global, Inc.
Daly City, CA
1 day ago
Remote Research Program Manager: Data & Ops Leader
...A leading technology organization is seeking a Research Program Manager IV to manage cross-functional projects. This role involves operational cadence, documentation improvement, and data analysis. The ideal candidate will have extensive program/project management experience...
Remote work
ManpowerGroup Global, Inc.
Daly City, CA
1 day ago
Insights Program Manager, LTS Scaled Insights
$105k - $170k
...business needs of the team. We're looking for a Insights Program Manager to join our Marketing organization as a measurement and analytics... ...methodologies, and explore emerging data technologies and AI modeling approaches that elevate how we work. You'll also serve as a...
For contractors
Work at office
Flexible hours
LinkedIn
San Francisco, CA
2 days ago
Product Manager, Safety Research
...Safety Research PM We are seeking a Safety Research PM to bridge Cohere... ...sits at the intersection of model research and product... ...Requirements ~5+ years of product management or research operations... ...researchers: you don't need to run evals yourself, but you need to...
Full time
Work at office
Remote work
Flexible hours
Cohere
San Francisco, CA
4 days ago
Model Policy
$207k - $295k
...About the Team Our Safety Systems team is at the forefront of OpenAI's mission to build... .... Within Safety Systems, the Model Policy team aligns model behavior with desired... ...model behavior. You will work closely with research, engineering, product, preparedness, and...
Work at office
Work from home
Relocation package
Shift work
OpenAI
San Francisco, CA
5 days ago
Research Engineer, Frontier Evals - Finance
$310k - $380k
...About the team The Frontier Evals team builds north star model evaluations to drive progress towards safe... ...loops to steer our training, safety, and launch decisions. Some of the team... ...About you We are seeking exceptional research engineers that can push the...
Work at office
Local area
Relocation package
Flexible hours
OpenAI
San Francisco, CA
more than 2 months ago
Program Manager
$110k - $150k
...The Center for AI Safety (CAIS) is a leading research and advocacy organization focused on mitigating societal-scale risks from AI. We address... ...Action Fund. We're seeking a highly skilled Program Manager who is excited by our mission to develop and promote the...
Work at office
Local area
Shift work
Center for AI Safety
San Francisco, CA
5 days ago
AI Evangelist - Program Manager
$147.68k - $236.28k
...AI Evangelist - Program Manager San Francisco, California, United States... ...society's most critical safety and justice issues with our... ...platform that provides GPT-class models, chat assistants, and secure... ...). You don't need to be a researcher, but you need to be a...
Work experience placement
Work at office
Axon
San Francisco, CA
14 hours ago
BSAFE Program Manager
$90k - $95k
...Position Title: BSAFE Program Manager FLSA: Exempt, Full Time Salary: $90k to $95k annually with full benefits Schedule: 40 hours... ...support to senior services. Summary: The BTWCSC BSAFE (Black Safety Access Freedom and Equity) Program Manager purpose is to...
Full time
Contract work
Temporary work
Work experience placement
For subcontractor
Work at office
Local area
Monday to Friday
Flexible hours
Shift work
Night shift
Afternoon shift
Booker T. Washington Community Center, INC.
San Francisco, CA
2 days ago
Program Manager, T&S Global Integrity Programs
$93.6k - $220.4k
...Program Manager, T&S Global Integrity Programs Location: San Francisco Employment Type: Regular Job Code: A139278A The Global... ...Integrity Programs (GIP) team is a central pillar of Trust & Safety, dedicated to safeguarding our platform information integrity...
Temporary work
Tik Tok
San Francisco, CA
2 days ago
Remote Internal/EM Physician for AI Model Tuning
...A leading AI research accelerator based in San Francisco is seeking a medical expert in internal or emergency medicine for a remote contractor... ...capabilities, ensuring high-quality patient care and safety. Ideal candidates will have an MD, strong clinical experience,...
For contractors
Remote work
Flexible hours
Turing
San Francisco, CA
1 day ago
Technical Program Manager, Safeguards (Infrastructure & Evals)
$290k - $365k
...growing group of committed researchers, engineers, policy... ...systems that sit between our models and the real world.... ..., but reliable: when a safety-critical pipeline goes... ...closely. As a Technical Program Manager for Safeguards Infrastructure and Evals, you'll own the...
Work at office
Visa sponsorship
Flexible hours
Shift work
Anthropic
San Francisco, CA
5 days ago
Clinical Research Manager - Heme Malignancy Program
Job Summary :We are looking to hire a Clinical Research Manager to help develop, shape and grow the clinical research team in the Oncology research program. The Heme Malignancy research program is a fast-paced environment and the successful candidate will be an independent...
University of California , San Francisco
San Francisco, CA
2 days ago
Program Manager, Safety
$162k - $240k
...About the Team The Safety Systems team works to ensure OpenAI's most capable models can be developed and deployed responsibly... ...is in need of a Safety Program Manager to streamline our safety review... ...multiple stakeholders - across research, product, engineering, legal,...
Work at office
Relocation package
OpenAI
San Francisco, CA
1 day ago
Head of Research
...Head of Research About the Company Respected AI research lab Industry Research Type Privately... ..., publication strategy, and managing the relationship between research and applied... ...thinking, particularly in the areas of safety and the balance between research freedom...
Confidential
San Francisco, CA
3 days ago
Technical Program Manager, Research
...Technical Program Manager - Multimodal Luma's mission is to build multimodal AI to expand... ...for intelligence. To go beyond language models and build more aware, capable, and useful... ...Manager to partner closely with researchers and engineers building state-of-the-art...
Luma AI
San Francisco, CA
2 days ago
Clinical Research Coordinator
...phase 1 clinical trial testing safety, colonization, acceptability... ...The SF-based clinical research coordinator will perform independently... ...the overall study with data management, generating reports, specimen... ...leads research and training programs around the world to...
Traineeship
Work at office
Worldwide
University of California , San Francisco
San Francisco, CA
5 days ago
Principal NPI Program Manager
$210k - $336k
...explorers, pursuing society's most critical safety and justice issues with our ecosystem of... ...Your Impact As a Principal NPI Program Manager, you will own the operational engine that... ...systems - translating acquired pricing models, product structures, and go-to-market approaches...
Work experience placement
Live in
Work at office
Remote work
Flexible hours
Axon
San Francisco, CA
2 days ago
Program Manager, Human Data
$162k - $240k
...design and run end-to-end programs that capture the depth... ...-stakes uses of our models. Our remit spans bespoke... ...partner closely across all research teams to translate... ...Role As a Program Manager (PGM) in the Human Data... ...in broader topics like safety, you find satisfaction...
Flexible hours
Shift work
OpenAI
San Francisco, CA
5 days ago
Technical Program Manager
...schedules of billion-dollar infrastructure projects and improving safety on job sites. Backed by $350M in funding, we're working... ...to have you join us. We're looking for a dynamic Technical Program Manager to help build Bedrock's platform from the ground up, driving...
Work at office
Flexible hours
Bedrock Robotics
San Francisco, CA
1 day ago
Technical Program Manager
...stretching resources. Our system combines industry-leading safety guardrails with the largest dynamic knowledge graph built on... ...Business Insider's top startups in healthcare. As a Technical Program Manager on the Customer Enablement team, you will lead the technical...
Work at office
Infinitus LLC
San Francisco, CA
4 days ago
Research Product Manager
$175k
...Research Product Manager San Francisco Thinking Machines Lab's mission is... ...Character.ai, open-weights models like Mistral, as well as popular... ...technical products and programs that span research,... ...contributions to areas like evals, multimodality, human-ai interaction...
Local area
Immediate start
Visa sponsorship
Work visa
Relocation package
Thinking Machines Lab
San Francisco, CA
1 day ago

Do you want to receive more vacancies?

Subscribe and receive similar vacancies to Research Program Manager - Model Evals and Safety. Be the first to apply!