Research Program Manager - Model Evals and Safety
Reflection AI, Inc
Our Mission Reflection's mission is to build open superintelligence and make it accessible to all . We're developing open weight models for individuals, agents, enterprises, and even nation states. Our team of AI researchers and company builders come from DeepMind, OpenAI, Google Brain, Meta, Character.AI, Anthropic and beyond. About the Role Research Program Managers at Reflection are high-leverage leaders and operators who embed directly with research and infrastructure teams to accelerate the pace of frontier model development. They are not project trackers. They are force multipliers who bring clarity to ambiguity, drive decisions when the path forward is unclear, and ensure that the work happening across multiple teams connects into a coherent whole. This is a foundational role. Reflection is building model evals and safety from the ground up, and this RPM will be at the center of that effort. You won't be stepping into an established function with existing processes and tooling. You will be the person who figures out what this function needs to look like, stands it up, and makes it real. That means defining the evaluation frameworks, building the operational infrastructure for model safety, establishing the processes that connect evals to the model development lifecycle, and laying the groundwork for how Reflection interfaces with the broader safety ecosystem. This is 0-to-1 work in its purest form. You bring a first-responder mentality. When things go sideways, you don't wait to be asked. You jump in, assess the situation, cut through noise, align the people who need to be aligned, and drive resolution. What You'll Do
- Build the foundational infrastructure for model evals and safety at Reflection. Define the evaluation frameworks, tooling requirements, and operational processes that will underpin how we assess model capabilities, risks, and readiness for release.
- Stand up model safety operations as a function, including establishing the workflows, review cadences, and decision frameworks that connect safety evaluation to the model development and release lifecycle.
- Partner with research and engineering leads across pre-training, mid-training, and post-training to embed safety and evaluation checkpoints into the development process in a way that is rigorous without being a bottleneck.
- Drive the scoping and prioritization of eval science and eval infrastructure investments, working with technical leads to determine what to build in-house, what to adopt, and where to invest research effort.
- Establish Reflection's engagement with the external safety ecosystem, including third-party assessments, academic partnerships, and industry safety frameworks. Represent the company's safety posture to external stakeholders with credibility and clarity.
- Create visibility and reporting structures that give leadership a clear, honest picture of model safety status, evaluation coverage, and open risks, so they can make informed decisions at the pace the business requires.
- Champion a culture of blameless post-mortems and continuous learning, turning every safety-relevant finding into a concrete improvement to our systems and processes.
- 7+ years of experience in technical program management, research operations, or ML engineering, with demonstrated experience standing up new functions, teams, or programs from scratch.
- Familiar with the landscape of model evaluation and AI safety, including evaluation methodologies, red-teaming, alignment research, and the evolving regulatory and industry safety ecosystem. You don't need to be a safety researcher, but you need to understand the space well enough to make sound judgments about what matters and what to prioritize.
- Deep enough technically to engage with researchers and engineers on topics like model behavior, evaluation design, data pipelines, and safety-critical system architecture. You follow the technical thread and you know when something doesn't add up.
- Proven ability to build structures where none exists. You've taken ambiguous mandates and turned them into functioning programs with clear ownership, measurable outcomes, and durable processes.
- Strong stakeholder management skills spanning deeply technical ICs, research leadership, and external partners. You build trust through competence and follow-through.
- Excited to build from zero to one. We are a small, fast-moving team and this role will help define how model safety and evaluation works at Reflection.
- Motivated by enabling researchers and engineers to build the world's most capable open-weight AI systems, responsibly.
- Top-tier compensation: Salary and equity structured to recognize and retain the best talent globally.
- Health & wellness: Comprehensive medical, dental, vision, life, and disability insurance.
- Life & family: Fully paid parental leave for all new parents, including adoptive and surrogate journeys. Financial support for family planning.
- Benefits & balance: paid time off when you need it, relocation support, and more perks that optimize your time.
- Opportunities to connect with teammates: lunch and dinner are provided daily. We have regular off-sites and team celebrations.
Vacancy posted 2 days ago
Similar jobs that could be interesting for youBased on the Research Program Manager - Model Evals and Safety in San Francisco, CA vacancy
- ...Research Program Manager Reflection's mission is to build open superintelligence and make it... ...all. We're developing open weight models for individuals, agents, enterprises... ...role. Reflection is building model evals and safety from the ground up, and this RPM will...SuggestedRelocation package
$207k - $285k
...surfacing vulnerabilities, and collaborating closely with researchers to strengthen model reliability and public trust. About the Role As a Technical Program Manager, you will lead initiatives that test the safety and robustness of OpenAI's models through creative...SuggestedWork at officeRelocation package$120k - $200k
...network trains frontier AI models in the same way teachers teach... ...team. You'll work alongside researchers, operators, and AI companies... ...AI models. As a Research Program Manager, you will play a central role... ...Face) Run and monitor new evals Support with marketing for...SuggestedWork at officeRelocation package$65 - $75 per hour
Primary Skills: Program Management (Intermediate), Cross-functional Collaboration (Proficient), Detailed Tracking (Advanced), Stakeholder Communication... ...Manager to enhance coordination and consistency within a research and strategy team. You will manage cross-functional...SuggestedHourly payContract work$95 - $105 per hour
...Our client, a leading technology organization specializing in global communication platforms, is seeking a Research Program Manager IV to join their team. As a Research Program Manager IV, you will be part of the Data & Operations department supporting cross‑functional...SuggestedHourly payWeekly payTemporary workRemote workFlexible hours$67.61 - $84.51 per hour
...Description Research Program Manager Full-time San Francisco, CA, US You'll be joining Adobe on a contract opportunity, employed through NextDeavor Benefits You'll Love NextDeavor offers health, vision and dental benefits for contract employees...Hourly payPermanent employmentFull timeContract work$365k
...growing group of committed researchers, engineers, policy... ...works across the full model development lifecycle,... ...interpretability, and safety, each operating at the... .... As a Technical Program Manager for Research, you'll define... ...areas like compute, evals, RL environments, and...Work at officeVisa sponsorshipFlexible hoursShift work- ...to all. We're developing open weight models for individuals, agents, enterprises, and even nation states. Our team of AI researchers and company builders come from DeepMind... ...beyond. About the Role Research Program Managers at Reflection are high-leverage leaders...Relocation package
- ...A leading technology organization is seeking a Research Program Manager IV to support cross-functional teams. This remote role requires at least 8 years of program management experience and 3 years in analytics. Key responsibilities include managing program operations...Remote work
- ...A leading technology organization is seeking a Research Program Manager IV to manage cross-functional projects. This role involves operational cadence, documentation improvement, and data analysis. The ideal candidate will have extensive program/project management experience...Remote work
$105k - $170k
...business needs of the team. We're looking for a Insights Program Manager to join our Marketing organization as a measurement and analytics... ...methodologies, and explore emerging data technologies and AI modeling approaches that elevate how we work. You'll also serve as a...For contractorsWork at officeFlexible hours- ...Safety Research PM We are seeking a Safety Research PM to bridge Cohere... ...sits at the intersection of model research and product... ...Requirements ~5+ years of product management or research operations... ...researchers: you don't need to run evals yourself, but you need to...Full timeWork at officeRemote workFlexible hours
$207k - $295k
...About the Team Our Safety Systems team is at the forefront of OpenAI's mission to build... .... Within Safety Systems, the Model Policy team aligns model behavior with desired... ...model behavior. You will work closely with research, engineering, product, preparedness, and...Work at officeWork from homeRelocation packageShift work$310k - $380k
...About the team The Frontier Evals team builds north star model evaluations to drive progress towards safe... ...loops to steer our training, safety, and launch decisions. Some of the team... ...About you We are seeking exceptional research engineers that can push the...Work at officeLocal areaRelocation packageFlexible hours$110k - $150k
...The Center for AI Safety (CAIS) is a leading research and advocacy organization focused on mitigating societal-scale risks from AI. We address... ...Action Fund. We're seeking a highly skilled Program Manager who is excited by our mission to develop and promote the...Work at officeLocal areaShift work$147.68k - $236.28k
...AI Evangelist - Program Manager San Francisco, California, United States... ...society's most critical safety and justice issues with our... ...platform that provides GPT-class models, chat assistants, and secure... ...). You don't need to be a researcher, but you need to be a...Work experience placementWork at office$90k - $95k
...Position Title: BSAFE Program Manager FLSA: Exempt, Full Time Salary: $90k to $95k annually with full benefits Schedule: 40 hours... ...support to senior services. Summary: The BTWCSC BSAFE (Black Safety Access Freedom and Equity) Program Manager purpose is to...Full timeContract workTemporary workWork experience placementFor subcontractorWork at officeLocal areaMonday to FridayFlexible hoursShift workNight shiftAfternoon shift$93.6k - $220.4k
...Program Manager, T&S Global Integrity Programs Location: San Francisco Employment Type: Regular Job Code: A139278A The Global... ...Integrity Programs (GIP) team is a central pillar of Trust & Safety, dedicated to safeguarding our platform information integrity...Temporary work- ...A leading AI research accelerator based in San Francisco is seeking a medical expert in internal or emergency medicine for a remote contractor... ...capabilities, ensuring high-quality patient care and safety. Ideal candidates will have an MD, strong clinical experience,...For contractorsRemote workFlexible hours
$290k - $365k
...growing group of committed researchers, engineers, policy... ...systems that sit between our models and the real world.... ..., but reliable: when a safety-critical pipeline goes... ...closely. As a Technical Program Manager for Safeguards Infrastructure and Evals, you'll own the...Work at officeVisa sponsorshipFlexible hoursShift work- Job Summary :We are looking to hire a Clinical Research Manager to help develop, shape and grow the clinical research team in the Oncology research program. The Heme Malignancy research program is a fast-paced environment and the successful candidate will be an independent...
$162k - $240k
...About the Team The Safety Systems team works to ensure OpenAI's most capable models can be developed and deployed responsibly... ...is in need of a Safety Program Manager to streamline our safety review... ...multiple stakeholders - across research, product, engineering, legal,...Work at officeRelocation package- ...Head of Research About the Company Respected AI research lab Industry Research Type Privately... ..., publication strategy, and managing the relationship between research and applied... ...thinking, particularly in the areas of safety and the balance between research freedom...
- ...Technical Program Manager - Multimodal Luma's mission is to build multimodal AI to expand... ...for intelligence. To go beyond language models and build more aware, capable, and useful... ...Manager to partner closely with researchers and engineers building state-of-the-art...
- ...phase 1 clinical trial testing safety, colonization, acceptability... ...The SF-based clinical research coordinator will perform independently... ...the overall study with data management, generating reports, specimen... ...leads research and training programs around the world to...TraineeshipWork at officeWorldwide
$210k - $336k
...explorers, pursuing society's most critical safety and justice issues with our ecosystem of... ...Your Impact As a Principal NPI Program Manager, you will own the operational engine that... ...systems - translating acquired pricing models, product structures, and go-to-market approaches...Work experience placementLive inWork at officeRemote workFlexible hours$162k - $240k
...design and run end-to-end programs that capture the depth... ...-stakes uses of our models. Our remit spans bespoke... ...partner closely across all research teams to translate... ...Role As a Program Manager (PGM) in the Human Data... ...in broader topics like safety, you find satisfaction...Flexible hoursShift work- ...schedules of billion-dollar infrastructure projects and improving safety on job sites. Backed by $350M in funding, we're working... ...to have you join us. We're looking for a dynamic Technical Program Manager to help build Bedrock's platform from the ground up, driving...Work at officeFlexible hours
- ...stretching resources. Our system combines industry-leading safety guardrails with the largest dynamic knowledge graph built on... ...Business Insider's top startups in healthcare. As a Technical Program Manager on the Customer Enablement team, you will lead the technical...Work at office
$175k
...Research Product Manager San Francisco Thinking Machines Lab's mission is... ...Character.ai, open-weights models like Mistral, as well as popular... ...technical products and programs that span research,... ...contributions to areas like evals, multimodality, human-ai interaction...Local areaImmediate startVisa sponsorshipWork visaRelocation package
Do you want to receive more vacancies?
Subscribe and receive similar vacancies to Research Program Manager - Model Evals and Safety. Be the first to apply!
Related searches
- senior research manager San Francisco, CA
- director of research San Francisco, CA
- research coordinator San Francisco, CA
- associate director clinical research San Francisco, CA
- research program manager San Francisco, CA
- account manager market research San Francisco, CA
- research lab manager San Francisco, CA
- research manager San Francisco, CA
- clinical research manager remote San Francisco, CA
- research supervisor San Francisco, CA

