AI Model Evaluation Program Lead

$300k - $320k

Anthropic

About the role: We are seeking a Technical Program Manager to lead our AI model evaluation initiatives across multiple workstreams. This role will be crucial in assessing the performance, capabilities, limitations, and potential risks of our AI models. Working closely with our Research, Trust & Safety, Frontier Redteaming, and Policy teams, you will drive high-priority evaluation projects to build new processes, align metrics with policy, and track measurable progress. You will help build and adapt the model evaluation program to ensure model deployments are rigorous and aligned with our commitment to responsible AI development. The ideal candidate will have a strong technical background and experience managing cross-functional programs in AI development, ML engineering, or related fields. You’ll be joining a team of Technical Program Managers who own and drive cross-functional programs that align to the company’s top priorities. In this role, you’ll have the opportunity to make a foundational impact as you contribute the scaling of a centralized TPM function for the company. Extremely strong soft skills are paramount, as our team is front and center in driving lots of company-wide changes and top priority initiatives that require generating buy-in, balancing various opinions, and competing for attention in our rapidly scaling environment. This role is a great fit for someone who has both seen excellence at scale and operated in rapidly scaling, high-ambiguity teams and scope. We are seeking candidates with deep TPM expertise but who are comfortable acting as adaptable generalists who add value fast. We excel at maintaining a broad view of our work but diving deep into the details when necessary. We understand business goals, translate and organize them into technical programs and projects, and drive execution. We are adept at engaging with both non-technical and technical stakeholders at all levels of the company, including executive leadership. In this role, you will have the opportunity to shape the development of advanced AI systems and contribute to Anthropic's mission of ensuring that AI benefits all of humanity. If you are passionate about responsible AI development, have a strong technical background, and thrive in a fast-paced, collaborative environment, we'd love to hear from you. Responsibilities: Partner with teams like Frontier Risk Evaluations, Security, and Trust & Safety to develop and implement comprehensive evaluation protocols for our latest frontier AI models Build a single source of truth for tracking all types of model evaluations as required by our Responsible Scaling Policy, AI safety institutes, the White House, and others Develop and maintain procedures for conducting evaluations, including designing test suites, coordinating red team exercises, and analyzing results Create and manage dashboards and reporting systems to track model performance, safety metrics, and evaluation outcomes across different AI systems and versions Lead cross-functional workshops to identify potential risks and edge cases for evaluation, ensuring thorough coverage of AI capabilities and limitations Coordinate with external partners and industry standards bodies to align our evaluation practices with emerging best practices in responsible AI development Provide detailed status reports, identifying technical risks, dependencies, and areas requiring additional support Facilitate communication and coordination between technical workstreams and stakeholders Continuously identify opportunities for technical process improvements and implement changes as needed Stay up-to-date with the latest developments in AI safety, ML engineering, and related fields to ensure the program remains at the forefront of responsible AI development You might be a good fit if you: Have several years of experience in technical program management, with a track record of successfully delivering complex technical programs, preferably in AI development, ML engineering, or related fields Have experience executing technical programs that require systems and engineering-level knowledge. Have exceptionally strong interpersonal and communication skills that enable you to influence without authority, build cross-organizational support, cooperation and action around initiatives and process adoption. Have experience prompt engineering on language models Have experience designing and/or running evaluations on Large Language Models Have knowledge of emerging AI governance frameworks and best practices Have a high threshold for navigating ambiguity and are able to balance setting strategic priorities with rapid, high-quality execution. Thrive in unstructured environments, and have a knack for bringing order to chaos. The expected salary range for this position is: Annual Salary:

$300,000—$320,000 USD

Logistics Location-based hybrid policy: Currently, we expect all staff to be in one of our offices at least 25% of the time. However, some roles may require more time in our offices. US visa sponsorship: We do sponsor visas! However, we aren't able to successfully sponsor visas for every role and every candidate; operations roles are especially difficult to support. But if we make you an offer, we will make every effort to get you into the United States, and we retain an immigration lawyer to help with this. We encourage you to apply even if you do not believe you meet every single qualification. Not all strong candidates will meet every single qualification as listed. Research shows that people who identify as being from underrepresented groups are more prone to experiencing imposter syndrome and doubting the strength of their candidacy, so we urge you not to exclude yourself prematurely and to submit an application if you're interested in this work. We think AI systems like the ones we're building have enormous social and ethical implications. We think this makes representation even more important, and we strive to include a range of diverse perspectives on our team. Compensation and Benefits* Anthropic’s compensation package consists of three elements: salary, equity, and benefits. We are committed to pay fairness and aim for these three elements collectively to be highly competitive with market rates. Equity - For eligible roles, equity will be a major component of the total compensation. We aim to offer higher-than-average equity compensation for a company of our size, and communicate equity amounts at the time of offer issuance. US Benefits - The following benefits are for our US-based employees: Optional equity donation matching. Comprehensive health, dental, and vision insurance for you and all your dependents. 401(k) plan with 4% matching. 22 weeks of paid parental leave. Unlimited PTO – most staff take between 4-6 weeks each year, sometimes more! Stipends for education, home office improvements, commuting, and wellness. Fertility benefits via Carrot. Daily lunches and snacks in our office. Relocation support for those moving to the Bay Area. UK Benefits - The following benefits are for our UK-based employees: Optional equity donation matching. Private health, dental, and vision insurance for you and your dependents. Pension contribution (matching 4% of your salary). 21 weeks of paid parental leave. Unlimited PTO – most staff take between 4-6 weeks each year, sometimes more! Health cash plan. Life insurance and income protection. Daily lunches and snacks in our office. #J-18808-Ljbffr Anthropic

Apply

Vacancy posted 1 day ago

Similar jobs that could be interesting for youBased on the AI Model Evaluation Program Lead in San Francisco, CA vacancy

Evaluations Lead AI Model Progress & Metrics
Cartesia is looking for an Evaluations Lead to design frameworks that measure AI model interactions, focusing on understanding, naturalness, and adaptability in real-world settings. The role blends research, product, and infrastructure, requiring a candidate with a scientific...
Suggested
Cartesia, Inc.
San Francisco, CA
4 days ago
Technical Program Manager, Safety & Model Evaluation (Hybrid)
$207k - $285k
OpenAI is seeking a Technical Program Manager in San Francisco to lead initiatives that ensure the safety and robustness of its AI models. The role involves collaborating with diverse teams to turn risks into actionable plans. Ideal candidates will have experience in technical...
Suggested
OpenAI
San Francisco, CA
1 day ago
Research Program Lead - AI Safety & Evaluation
Gravity Engineering Services Pvt Ltd. is looking for a Technical Program Manager for Research to define and build programs essential for research teams at the cutting edge of AI development. This role requires engagement across complex and ambiguous research initiatives...
Suggested
Gravity Engineering Services Pvt Ltd.
San Francisco, CA
2 days ago
AI Partnerships & Model Launch Lead
...San Francisco, CA. You will work directly with partners on AI infrastructure and model launches, serving as their primary contact and ensuring... ...leadership across commercial negotiations, marketing, and program management to drive partner satisfaction and achieve revenue...
Suggested
Gravity Engineering Services Pvt Ltd.
San Francisco, CA
12 hours ago
Research Lead, Model Evaluation & Training Insights
Anthropic is seeking a Research Lead for the Training Insights team to shape the evaluation of model capabilities. This hands-on leadership role involves developing innovative... ...You will play a crucial role in transforming how AI capabilities are assessed, working...
Suggested
Remote work
Anthropic
San Francisco, CA
12 hours ago
Technical Program Manager - Adversarial Model Research
...and mitigating risks in advanced AI systems by designing evaluations, surfacing vulnerabilities, and collaborating... ...with researchers to strengthen model reliability and public trust. About the Role As a Technical Program Manager, you will lead initiatives that test the safety...
Work at office
Relocation package
OpenAI
San Francisco, CA
3 days ago
Research Engineer, Model Evaluations
...interpretable, and steerable AI systems. We want AI to be safe... ...Engineers to build the evaluations that tell us — and the world... ...and leadership use to monitor model health during training, improving... ...qualifications Strong Python programming skills, including production...
Full time
Work at office
Visa sponsorship
Flexible hours
Anthropic
San Francisco, CA
1 day ago
Research Lead - AI Cyber Testing & Evaluation
$146.2k - $261.4k
...Description RAND’s Center on AI, Security, and Technology (... ...policy analysis projects, and leading multidisciplinary teams of... ...team will build systems to evaluate how AI models perform across the full attack... ..., C/C++, or other popular programming languages Experience with red...
Fixed term contract
Work experience placement
Remote work
Work from home
Dormont Manufacturing Company
San Francisco, CA
12 hours ago
Model Behavior Architect — AI Alignment & Evaluation
$180k - $260k
...investment firm in San Francisco is seeking a Model Behavior Architect to enhance their answer... ...should demonstrate a strong passion for AI, be familiar with Python, and possess a... ...80K to $260K and a comprehensive benefits program for full-time U.S. employees. #J-18808-Ljbffr...
Full time
Pantera Capital
San Francisco, CA
4 days ago
Finance AI Model Evaluator - Contract, 20 hrs/week
$50 - $75 per hour
A leading tech company based in Australia is seeking an AI Model Evaluator on a contract basis. The role involves evaluating AI-generated responses, writing prompts, and providing justifications based on specific criteria. Ideal candidates will hold a Master's degree in...
Hourly pay
Contract work
Mercor
San Francisco, CA
3 days ago
Senior AI Model Evaluation Scientist (Remote‑Flexible)
Cohere is hiring a Senior Research Scientist, Model Evaluation, to lead the creation of next‑gen evaluation methods and infrastructure for measuring LLM progress. You will push the boundaries of our evaluation benchmarks and work with cross‑functional teams to translate...
Remote job
Flexible hours
SupportFinity™
San Francisco, CA
1 day ago
Member of Technical Staff, Model Evaluation
$350k
...Our first goal is to democratize frontier AI R&D across scientific disciplines. We... ...AI research company and training our own models end-to-end. Our work spans areas such as... ...looking for a research engineer to build the evaluation infrastructure that tells us whether our...
Mirendil
San Francisco, CA
3 days ago
AI Data Quality & Model Evaluation Associate
Welocalize is seeking a Data Quality Associate to evaluate AI model outputs and provide structured feedback. This is a full-time, onsite role located in San Francisco. The ideal candidate possesses a Bachelor's degree and has 1-2 years of professional writing experience...
Full time
Welocalize
San Francisco, CA
1 day ago
Remote AI Training Specialist: Model Tuning & Evaluation
$25 per hour
Prolific is seeking AI Training Experts to assist in training and evaluating cutting-edge AI models. The role involves completing tasks such as analyzing and writing annotations, and judging AI performance. Candidates should have professional experience as an AI Trainer...
Remote job
Hourly pay
Work from home
Flexible hours
Prolific
San Francisco, CA
2 days ago
Model Policy Lead: AI Safety & Risk Governance
The Consulting Solutions is seeking a specialist to define AI model behavior in high-risk contexts, focusing on safety and ethical standards. The role involves developing policies to balance the benefits and risks of AI systems, ensuring they adhere to safety norms. Key...
The Consulting Solutions
San Francisco, CA
12 hours ago
Onsite Data Quality Associate AI Model Evaluator
Welocalize is seeking a detail-oriented Data Quality Associate in San Francisco. This role involves evaluating AI model outputs, providing feedback, and ensuring data quality. Candidates must have native-level language proficiency and a Bachelor's degree. The position requires...
Welocalize
San Francisco, CA
4 days ago
Senior Safety Programs Manager — AI Model Launch & Risk
$162k - $240k
PMs for Hire seeks a Safety Program Manager in San Francisco to lead the safety review process for models and products. The role involves coordinating cross-functional teams to drive safe deployments, ensuring risks are effectively monitored and mitigated. Candidates should...
PMs for Hire
San Francisco, CA
3 days ago
Scholar Program Lead — AI-Driven Ops and Partnerships
...Francisco, is seeking a mission-driven leader for the Goldberg Scholar Program. This role involves managing relationships with scholars,... ...excellent relationship-building skills, and a keen interest in AI tools. The position offers a hybrid work schedule, promoting both...
Work at office
Remote work
Sgff.Org
San Francisco, CA
2 days ago
Evaluations Partnerships Lead - Youth AI Safety
$90k - $110k
Common Sense Media is seeking an Evaluations Partner Manager in San Francisco, California. This role involves managing the operational execution of the Youth AI Safety Institute's evaluation work, with responsibilities such as coordinating workflow between internal teams...
Full time
Common Sense Media
San Francisco, CA
3 days ago
AI Evaluation Lead: Real-World Systems Benchmarking
A cutting-edge AI technology firm in San Francisco is seeking an Evaluation Lead to drive the assessment of AI model performance. You will design evaluation methodologies, automate evaluation processes, and oversee various evaluation strategies. The ideal candidate has...
SupportFinity™
San Francisco, CA
5 days ago
Scaled CS Programs Lead — AI-Driven Customer Success
...experience in digital customer success at a high-growth SaaS company, with proven experience in building successful automated customer programs. Compensation includes a competitive base salary ranging from 180K to 280K, along with generous benefits and a lunch stipend. #J-...
Juicebox App, Inc.
San Francisco, CA
3 days ago
AI Programs Lead End-to-End LLM Data Delivery
Super Annotate is searching for a Strategic Projects Lead/AI Operations Manager who will take ownership of their most complex client engagements... ...managing the end-to-end delivery of large LLM and Gen AI data programs, engaging directly with clients and researchers. The ideal...
Super Annotate
San Francisco, CA
1 day ago
VOC Program Lead: AI-Driven Customer Insights
$127k - $269k
Figma is seeking a strategic, data-driven Voice of the Customer (VOC) Program Manager to lead a company-wide VOC program. This role is pivotal in surfacing customer insights from various sources and ensuring they are acted upon to improve processes and products. The ideal...
Full time
Remote work
Figma
San Francisco, CA
1 day ago
Healthcare AI Enablement Program Lead
Abridge is seeking an AI Enablement Program Manager in San Francisco to drive AI tool adoption and literacy across the company. This role involves designing effective programs, managing tool lifecycle approval processes, and tracking AI adoption metrics. The ideal candidate...
Abridge
San Francisco, CA
1 day ago
Program Management Lead, COE & AI Initiatives
$115k - $186k
The Resume Database seeks a Manager, Program Management to drive strategy and execution of cross... ...management skills and the ability to lead a diverse team effectively. With a salary... ...program management and a strong expertise in AI methodologies. You will be pivotal in...
The Resume Database
San Francisco, CA
12 hours ago
Strategic AI Lab Programs Lead
Morpheus Talent Solutions seeks a Senior Strategic Project Lead to own a portfolio of top AI lab accounts and shape the SPL function. You’ll lead cross‑functional teams, design scalable data programs, and drive strategic client outcomes. Based in San Francisco, you’ll...
Relocation package
Morpheus Talent Solutions
San Francisco, CA
3 hours ago
Senior AI Quality Lead: Regression & Evaluation
$176k - $253k
Harper Group, an AI-native commercial insurance company in San Francisco, is looking for a Senior Member of Technical Staff focused... ...from vague estimations to concrete metrics, ensuring agents are evaluated, tested, and monitored effectively. Candidates should have experience...
Harper Group
San Francisco, CA
1 day ago
Enterprise AI Program Lead: Tech Delivery & Strategy
$211.2k - $264k
Scale AI, Inc. is looking for a Technical Program Manager who will own the delivery of key enterprise initiatives in San Francisco. This role involves managing timelines and fostering strategic alignment between engineering, product, and leadership. The ideal candidate...
Scale AI, Inc.
San Francisco, CA
3 days ago
Cloud Inference Program Lead for AI Deployments
$290k
Anthropic is seeking a Technical Program Manager for Cloud Inference in New York. You will drive coordination across multiple teams and manage the launch of AI models on cloud platforms such as Amazon Bedrock and Google Vertex AI. This role requires expertise in technical...
Anthropic
San Francisco, CA
2 days ago
Scholarship Programs Lead — AI‑Driven Impact & Partnerships
The Sandberg Goldberg Bernthal Family Foundation is seeking a program officer for the Goldberg Scholar Program based in San Francisco. This... ...management with partners and scholars while integrating AI tools to enhance operations and efficiency. The ideal candidate...
Work at office
Sandberg Goldberg Bernthal Family Foundation
San Francisco, CA
12 hours ago

Do you want to receive more vacancies?

Subscribe and receive similar vacancies to AI Model Evaluation Program Lead. Be the first to apply!