Research Program Manager - Model Evals and Safety

AI Chopping Block, Inc.

Our Mission Reflection’s mission is to build open superintelligence and make it accessible to all . We’re developing open weight models for individuals, agents, enterprises, and even nation states. Our team of AI researchers and company builders come from DeepMind, OpenAI, Google Brain, Meta, Character.AI, Anthropic and beyond. About the Role Research Program Managers at Reflection are high-leverage leaders and operators who embed directly with research and infrastructure teams to accelerate the pace of frontier model development. They are not project trackers. They are force multipliers who bring clarity to ambiguity, drive decisions when the path forward is unclear, and ensure that the work happening across multiple teams connects into a coherent whole. This is a foundational role. Reflection is building model evals and safety from the ground up, and this RPM will be at the center of that effort. You won't be stepping into an established function with existing processes and tooling. You will be the person who figures out what this function needs to look like, stands it up, and makes it real. That means defining the evaluation frameworks, building the operational infrastructure for model safety, establishing the processes that connect evals to the model development lifecycle, and laying the groundwork for how Reflection interfaces with the broader safety ecosystem. This is 0-to-1 work in its purest form. You bring a first-responder mentality. When things go sideways, you don't wait to be asked. You jump in, assess the situation, cut through noise, align the people who need to be aligned, and drive resolution. What You'll Do Build the foundational infrastructure for model evals and safety at Reflection. Define the evaluation frameworks, tooling requirements, and operational processes that will underpin how we assess model capabilities, risks, and readiness for release. Stand up model safety operations as a function, including establishing the workflows, review cadences, and decision frameworks that connect safety evaluation to the model development and release lifecycle. Partner with research and engineering leads across pre-training, mid-training, and post-training to embed safety and evaluation checkpoints into the development process in a way that is rigorous without being a bottleneck. Drive the scoping and prioritization of eval science and eval infrastructure investments, working with technical leads to determine what to build in-house, what to adopt, and where to invest research effort. Establish Reflection's engagement with the external safety ecosystem, including third-party assessments, academic partnerships, and industry safety frameworks. Represent the company's safety posture to external stakeholders with credibility and clarity. Create visibility and reporting structures that give leadership a clear, honest picture of model safety status, evaluation coverage, and open risks, so they can make informed decisions at the pace the business requires. Champion a culture of blameless post-mortems and continuous learning, turning every safety-relevant finding into a concrete improvement to our systems and processes. About You 7+ years of experience in technical program management, research operations, or ML engineering, with demonstrated experience standing up new functions, teams, or programs from scratch. Familiar with the landscape of model evaluation and AI safety, including evaluation methodologies, red‑teaming, alignment research, and the evolving regulatory and industry safety ecosystem. You don't need to be a safety researcher, but you need to understand the space well enough to make sound judgments about what matters and what to prioritize. Deep enough technically to engage with researchers and engineers on topics like model behavior, evaluation design, data pipelines, and safety‑critical system architecture. You follow the technical thread and you know when something doesn't add up. Proven ability to build structures where none exists. You've taken ambiguous mandates and turned them into functioning programs with clear ownership, measurable outcomes, and durable processes. Strong stakeholder management skills spanning deeply technical ICs, research leadership, and external partners. You build trust through competence and follow‑through. Excited to build from zero to one. We are a small, fast‑moving team and this role will help define how model safety and evaluation works at Reflection. Motivated by enabling researchers and engineers to build the world's most capable open‑weight AI systems, responsibly. What We Offer: We believe that to build superintelligence that is truly open, you need to start at the foundation. Joining Reflection means building from the ground up as part of a small talent‑dense team. You will help define our future as a company, and help define the frontier of open foundational models. We want you to do the most impactful work of your career with the confidence that you and the people you care about most are supported. Top-tier compensation: Salary and equity structured to recognize and retain the best talent globally. Health & wellness: Comprehensive medical, dental, vision, life, and disability insurance. Life & family: Fully paid parental leave for all new parents, including adoptive and surrogate journeys. Financial support for family planning. Benefits & balance: paid time off when you need it, relocation support, and more perks that optimize your time. Opportunities to connect with teammates: lunch and dinner are provided daily. We have regular off‑sites and team celebrations. #J-18808-Ljbffr AI Chopping Block, Inc.

Apply

Vacancy posted 5 days ago

Similar jobs that could be interesting for youBased on the Research Program Manager - Model Evals and Safety in San Francisco, CA vacancy

Technical Program Manager - Adversarial Model Research
...surfacing vulnerabilities, and collaborating closely with researchers to strengthen model reliability and public trust. About the Role As a Technical Program Manager, you will lead initiatives that test the safety and robustness of OpenAI’s models through creative...
Suggested
Work at office
Relocation package
OpenAI
San Francisco, CA
3 days ago
Research Program Manager
$120k - $200k
...intersection of labor markets and AI research. We partner with leading AI... ...network trains frontier AI models in the same way teachers... ...frontier AI models. As a Research Program Manager, you will play a central role... ...Face) Run and monitor new evals Support with marketing for...
Suggested
Work at office
Relocation package
Monday to Friday
Mercor
San Francisco, CA
4 days ago
Research Program Manager
$95 - $105 per hour
...Our client, a leading technology organization specializing in global communication platforms, is seeking a Research Program Manager IV to join their team. As a Research Program Manager IV, you will be part of the Data & Operations department supporting cross‑functional...
Suggested
Hourly pay
Weekly pay
Temporary work
Remote work
Flexible hours
ManpowerGroup Global, Inc.
Daly City, CA
4 days ago
Neuro Memory & Aging Research Program Manager
A leading university in health sciences is seeking an Admin Research Project Manager to oversee project management for NIH-funded research on neurodegenerative disorders. The role involves developing timelines, collaborating with various teams, and managing budgets. Candidates...
Suggested
Full time
University of California - San Francisco
San Francisco, CA
2 days ago
Model Policy
$207k - $295k
...About the Team Our Safety Systems team is at the forefront of OpenAI's mission to build... .... Within Safety Systems, the Model Policy team aligns model behavior with desired... ...model behavior. You will work closely with research, engineering, product, preparedness, and...
Suggested
Work at office
Work from home
Relocation package
Shift work
OpenAI
San Francisco, CA
3 days ago
Remote Research Program Manager - Data & Ops Lead
...A leading technology organization is seeking a Research Program Manager IV to support cross-functional teams. This remote role requires at least 8 years of program management experience and 3 years in analytics. Key responsibilities include managing program operations...
Remote work
ManpowerGroup Global, Inc.
Daly City, CA
4 days ago
Remote Research Program Manager: Data & Ops Leader
...A leading technology organization is seeking a Research Program Manager IV to manage cross-functional projects. This role involves operational cadence, documentation improvement, and data analysis. The ideal candidate will have extensive program/project management experience...
Remote work
ManpowerGroup Global, Inc.
Daly City, CA
4 days ago
Lab & Research Operations Manager — Equity & Impact
Becoming in San Francisco is searching for a Lab, Research & Operations Manager who will be in charge of laboratory safety, compliance, and operational excellence. The ideal candidate will thrive in a hands-on role, balancing in vivo research and management of lab operations...
Becoming
San Francisco, CA
2 days ago
[Expression of Interest] Research Manager, Interpretability
$340k - $425k
...growing group of committed researchers, engineers, policy... ...what modern language models are capable of, do you... ...core research bets on AI safety. We believe that a... ...networks as binary computer programs we're trying to "... ...About the role: As a manager on the Interpretability...
Work at office
Visa sponsorship
Flexible hours
3 days per week
Menlo Ventures
San Francisco, CA
5 days ago
Research Program Manager - Research Infrastructure
...to all . We’re developing open weight models for individuals, agents, enterprises, and even nation states. Our team of AI researchers and company builders come from DeepMind... ...Anthropic and beyond. About the Role Research Program Managers at Reflection are high-leverage leaders...
Full time
Relocation package
B Capital
San Francisco, CA
3 days ago
Technical Program Manager, Research
$365k
...growing group of committed researchers, engineers, policy... ...works across the full model development lifecycle,... ...interpretability, and safety, each operating at the... .... As a Technical Program Manager for Research, you\'ll... ...research areas like compute, evals, RL environments, and...
Work at office
Visa sponsorship
Flexible hours
Shift work
Anthropic
San Francisco, CA
3 days ago
Research Program Manager, Infrastructure & Training Ops
B Capital is seeking a Research Program Manager to scale research infrastructure for frontier model development in San Francisco. The ideal candidate will have over 7 years of experience in technical program management within ML/AI, possessing deep technical knowledge...
B Capital
San Francisco, CA
3 days ago
Technical Program Manager, Frontier Evals
$207k - $230k
...the Team OpenAI's Frontier Evals team designs and builds evaluations... ...of our most advanced models. As a research team, we advance both the... ...About the Role As a Technical Program Manager on Frontier Evals, you will... ..., human data, product, safety, legal, external vendors, and...
Work at office
Relocation package
OpenAI
San Francisco, CA
2 days ago
Interpretability Research Manager — AI Safety Leadership
$340k - $425k
A leading AI research organization is seeking a Manager for its Interpretability team in San Francisco. The ideal candidate will have a strong background in managing technical teams and a passion for AI safety research. This role involves overseeing project execution,...
Work at office
Flexible hours
Menlo Ventures
San Francisco, CA
5 days ago
Remote Internal/EM Physician for AI Model Tuning
...A leading AI research accelerator based in San Francisco is seeking a medical expert in internal or emergency medicine for a remote contractor... ...capabilities, ensuring high-quality patient care and safety. Ideal candidates will have an MD, strong clinical experience,...
For contractors
Remote work
Flexible hours
Turing
San Francisco, CA
4 days ago
Model Policy Architect: AI Safety & Risk (Hybrid)
$207k - $295k
A leading AI research company in San Francisco is looking for a Senior Policy role focused on model safety. The candidate will design policies to ensure safe AI behavior while advancing complex AI challenges. Candidates should have extensive experience in ML policies and...
OpenAI
San Francisco, CA
4 days ago
Director of AI Model Training & Research
A leading AI research firm in San Francisco is looking for a Research Engineering Manager to lead a team of AI researchers and engineers. The role involves developing state-of-the-art models, refining training processes, and integrating these models into products. Candidates...
Perplexity AI Inc.
San Francisco, CA
1 day ago
ML Engineer, Public Sector: Model Evaluations & Safety
$208k - $300k
...Public Sector to develop automated evaluation pipelines for AI models. You will work on advanced AI systems and ensure they perform... ...mission-critical environments. Ideal candidates have a strong programming background and experience in ML evaluation frameworks. Competitive...
Scale AI, Inc.
San Francisco, CA
2 days ago
AI Safety Lead: Red Team & Model Risk
A leading AI research firm in San Francisco is seeking an experienced professional to own the red-teaming pipeline for their models, ensuring safety and alignment. The ideal candidate has a graduate degree in Computer Science or a related field, along with a deep understanding...
Reflection
San Francisco, CA
5 days ago
Head of Research
...Head of Research About the Company Respected AI research lab Industry Research... ...prioritization, publication strategy, and managing the relationship between research and... ...thinking, particularly in the areas of safety and the balance between research freedom...
Confidential
San Francisco, CA
2 days ago
Technical Program Manager, Safeguards (Infrastructure & Evals)
$290k - $365k
...growing group of committed researchers, engineers, policy... ...systems that sit between our models and the real world.... ..., but reliable: when a safety-critical pipeline goes... ...closely. As a Technical Program Manager for Safeguards Infrastructure and Evals, you’ll own the...
Work at office
Visa sponsorship
Flexible hours
Shift work
Menlo Ventures
San Francisco, CA
5 days ago
Research Engineer - Evals
$160k - $240k
Research Engineer — Evals Location: San Francisco, CA (Hybrid) OR Remote (Americas, UTC-3 to UTC-10) Employment Type: Full time Department: Engineering... ...websites, formats, and edge cases. As we layer in models and agent workflows, the question "did that work?" gets harder...
Full time
Temporary work
Work at office
Remote work
AI Chopping Block, Inc.
San Francisco, CA
4 days ago
Research Engineer, Evals
.... We’re looking for a Research Engineer to help define... ...we measure and improve model quality. You’ll build the... ...run offline and online evals that measure model... ...modern ML systems Strong programming skills and comfort working... ...fraud, risk, trust and safety, compliance, or other...
Variance
San Francisco, CA
4 days ago
Research Engineer, Frontier Red Team (Autonomy)
$320k
...growing group of committed researchers, engineers, policy... ...and ensuring safety with self‑improving, highly... ...ll build and evaluate model organisms of autonomous... ...adversarial AI. Create evals and training environments... ...(we love pair programming!). Care deeply about...
Relocation
Visa sponsorship
Anthropic
San Francisco, CA
5 days ago
Research Engineer - Benchmarking, Evals & Failure Analysis
...talent network trains frontier AI models in the same way teachers... ...team. You’ll work alongside researchers, operators, and AI companies... ...reasoning. You’ll design and run evals, build rubrics and scorers,... ...tool use, reasoning errors, safety/alignment issues); categorize...
Work at office
Mercor
San Francisco, CA
4 days ago
Research Engineer, Frontier Evals - Finance
$310k - $380k
...About the team The Frontier Evals team builds north star model evaluations to drive progress towards safe... ...loops to steer our training, safety, and launch decisions. Some of the team... ...About you We are seeking exceptional research engineers that can push the...
Work at office
Local area
Relocation package
Flexible hours
OpenAI
San Francisco, CA
more than 2 months ago
Senior AI Safety Program Manager
$192k - $272k
Lila Sciences in San Francisco is seeking a Senior or Principal Technical Program Manager to lead operational efforts in AI safety. This role requires 6+ years of relevant program management experience and strong communication skills to bridge diverse teams. You will actively...
Lila Sciences
San Francisco, CA
1 day ago
Communications Manager, Research
$255k
...Communications Manager, Research New York City, NY; San Francisco, CA;... ...for Anthropic's research and model training work. This role will... ...frontier AI research and safety work more accessible and meaningful... ...and thought leadership programs — positioning technical leaders...
Work at office
Visa sponsorship
Flexible hours
anthropic
San Francisco, CA
15 days ago
Clinical Research Manager - Heme Malignancy Program
$109.4k - $166.3k
Clinical Research Manager - Heme Malignancy Program Job Summary We are looking to hire a Clinical Research Manager to help develop, shape and grow the clinical research team in the Oncology research program. The Heme Malignancy research program is a fast‑paced environment...
Work experience placement
University of California - San Francisco
San Francisco, CA
2 days ago
Program Manager, AV Operational Safety
$118k
About the Role We are seeking a highly experienced Operational Safety Program Manager to drive effective safety programs for fleet operators within Uber's autonomous mobility and delivery ecosystem. This role demands a deep understanding of Safety Management Systems (SMS...
Full time
Uber
San Francisco, CA
5 days ago

Do you want to receive more vacancies?

Subscribe and receive similar vacancies to Research Program Manager - Model Evals and Safety. Be the first to apply!