AI Model Evaluation Program Lead

$300k - $320k

Anthropic

About the role: We are seeking a Technical Program Manager to lead our AI model evaluation initiatives across multiple workstreams. This role will be crucial in assessing the performance, capabilities, limitations, and potential risks of our AI models. Working closely with our Research, Trust & Safety, Frontier Redteaming, and Policy teams, you will drive high-priority evaluation projects to build new processes, align metrics with policy, and track measurable progress. You will help build and adapt the model evaluation program to ensure model deployments are rigorous and aligned with our commitment to responsible AI development. The ideal candidate will have a strong technical background and experience managing cross-functional programs in AI development, ML engineering, or related fields. You’ll be joining a team of Technical Program Managers who own and drive cross-functional programs that align to the company’s top priorities. In this role, you’ll have the opportunity to make a foundational impact as you contribute the scaling of a centralized TPM function for the company. Extremely strong soft skills are paramount, as our team is front and center in driving lots of company-wide changes and top priority initiatives that require generating buy-in, balancing various opinions, and competing for attention in our rapidly scaling environment. This role is a great fit for someone who has both seen excellence at scale and operated in rapidly scaling, high-ambiguity teams and scope. We are seeking candidates with deep TPM expertise but who are comfortable acting as adaptable generalists who add value fast. We excel at maintaining a broad view of our work but diving deep into the details when necessary. We understand business goals, translate and organize them into technical programs and projects, and drive execution. We are adept at engaging with both non-technical and technical stakeholders at all levels of the company, including executive leadership. In this role, you will have the opportunity to shape the development of advanced AI systems and contribute to Anthropic's mission of ensuring that AI benefits all of humanity. If you are passionate about responsible AI development, have a strong technical background, and thrive in a fast-paced, collaborative environment, we'd love to hear from you. Responsibilities: Partner with teams like Frontier Risk Evaluations, Security, and Trust & Safety to develop and implement comprehensive evaluation protocols for our latest frontier AI models Build a single source of truth for tracking all types of model evaluations as required by our Responsible Scaling Policy, AI safety institutes, the White House, and others Develop and maintain procedures for conducting evaluations, including designing test suites, coordinating red team exercises, and analyzing results Create and manage dashboards and reporting systems to track model performance, safety metrics, and evaluation outcomes across different AI systems and versions Lead cross-functional workshops to identify potential risks and edge cases for evaluation, ensuring thorough coverage of AI capabilities and limitations Coordinate with external partners and industry standards bodies to align our evaluation practices with emerging best practices in responsible AI development Provide detailed status reports, identifying technical risks, dependencies, and areas requiring additional support Facilitate communication and coordination between technical workstreams and stakeholders Continuously identify opportunities for technical process improvements and implement changes as needed Stay up-to-date with the latest developments in AI safety, ML engineering, and related fields to ensure the program remains at the forefront of responsible AI development You might be a good fit if you: Have several years of experience in technical program management, with a track record of successfully delivering complex technical programs, preferably in AI development, ML engineering, or related fields Have experience executing technical programs that require systems and engineering-level knowledge. Have exceptionally strong interpersonal and communication skills that enable you to influence without authority, build cross-organizational support, cooperation and action around initiatives and process adoption. Have experience prompt engineering on language models Have experience designing and/or running evaluations on Large Language Models Have knowledge of emerging AI governance frameworks and best practices Have a high threshold for navigating ambiguity and are able to balance setting strategic priorities with rapid, high-quality execution. Thrive in unstructured environments, and have a knack for bringing order to chaos. The expected salary range for this position is: Annual Salary:

$300,000—$320,000 USD

Logistics Location-based hybrid policy: Currently, we expect all staff to be in one of our offices at least 25% of the time. However, some roles may require more time in our offices. US visa sponsorship: We do sponsor visas! However, we aren't able to successfully sponsor visas for every role and every candidate; operations roles are especially difficult to support. But if we make you an offer, we will make every effort to get you into the United States, and we retain an immigration lawyer to help with this. We encourage you to apply even if you do not believe you meet every single qualification. Not all strong candidates will meet every single qualification as listed. Research shows that people who identify as being from underrepresented groups are more prone to experiencing imposter syndrome and doubting the strength of their candidacy, so we urge you not to exclude yourself prematurely and to submit an application if you're interested in this work. We think AI systems like the ones we're building have enormous social and ethical implications. We think this makes representation even more important, and we strive to include a range of diverse perspectives on our team. Compensation and Benefits* Anthropic’s compensation package consists of three elements: salary, equity, and benefits. We are committed to pay fairness and aim for these three elements collectively to be highly competitive with market rates. Equity - For eligible roles, equity will be a major component of the total compensation. We aim to offer higher-than-average equity compensation for a company of our size, and communicate equity amounts at the time of offer issuance. US Benefits - The following benefits are for our US-based employees: Optional equity donation matching. Comprehensive health, dental, and vision insurance for you and all your dependents. 401(k) plan with 4% matching. 22 weeks of paid parental leave. Unlimited PTO – most staff take between 4-6 weeks each year, sometimes more! Stipends for education, home office improvements, commuting, and wellness. Fertility benefits via Carrot. Daily lunches and snacks in our office. Relocation support for those moving to the Bay Area. UK Benefits - The following benefits are for our UK-based employees: Optional equity donation matching. Private health, dental, and vision insurance for you and your dependents. Pension contribution (matching 4% of your salary). 21 weeks of paid parental leave. Unlimited PTO – most staff take between 4-6 weeks each year, sometimes more! Health cash plan. Life insurance and income protection. Daily lunches and snacks in our office. #J-18808-Ljbffr Anthropic

Apply

Vacancy posted 3 days ago

Similar jobs that could be interesting for youBased on the AI Model Evaluation Program Lead in Seattle, WA vacancy

AI Data Labeling & Model Evaluation Associate
Welo Data in Washington is seeking a Data Labeling Associate to evaluate AI systems and improve their performance. This position is full-time and requires native-level British English proficiency and a university degree. Responsibilities include evaluating AI outputs, identifying...
Suggested
Full time
Welo Data
Seattle, WA
5 days ago
Remote Python Infra Engineer for AI Model Evaluation
...seeking a Senior Python Infrastructure Engineer to work remotely on critical AI model development tasks. You will design, build, and optimize data pipelines, annotation tools, and evaluation systems essential for next-generation AI models. This contract role allows for...
Suggested
Contract work
Remote work
Flexible hours
Alignerr
Seattle, WA
1 day ago
AI Data Quality & Model Evaluation Associate
$34 per hour
Welocalize is seeking a Data Quality Associate in Seattle to evaluate AI model outputs and provide structured feedback. This role involves high-level interaction with cutting-edge AI systems and requires a Bachelor’s degree along with strong professional writing skills....
Suggested
Full time
Welocalize
Seattle, WA
3 days ago
AI Data Quality & Model Evaluation Associate
Welo Data is seeking a Data Labeling Associate in Washington State. This full-time role involves evaluating AI model outputs and improving data quality. The ideal candidate should have native-level Australian English proficiency, a bachelor’s degree, and strong analytical...
Suggested
Full time
Welo Data
Seattle, WA
1 day ago
AI Model Evaluation & Data Quality Associate
Welo Data is hiring a Data Labeling Associate in Washington with expertise in evaluating AI systems. This role focuses on providing structured feedback on model outputs and requires strong writing skills and attention to detail. As a full-time employee, you will engage...
Suggested
Full time
Welo Data
Seattle, WA
1 day ago
Remote Finance Expert: AI Model Evaluation & Strategy
$100 per hour
A leading technology firm is seeking finance experts to enhance AI models. Responsibilities include evaluating performance in capital markets and creating assessment rubrics. Candidates should have 2+ years in finance fields like investment banking and possess strong financial...
Remote job
Hourly pay
10 hours per week
Turing
Seattle, WA
3 days ago
AI Model Evaluator & Data Quality Analyst
Welocalize is seeking a Data Quality Associate in Seattle to evaluate AI model outputs and provide structured feedback. You will work directly with advanced AI systems and contribute to improving evaluation frameworks. Candidates must possess a Bachelor's degree, be proficient...
Full time
Contract work
Welocalize
Seattle, WA
3 days ago
AI-Driven Supply Chain Program Lead
$104.03k - $145.64k
Blue Origin is seeking a Supply Chain Program Manager III in Seattle, responsible for managing supply chains in a high-rate manufacturing environment. The ideal candidate will collaborate with engineers to develop sourcing strategies and ensure supplier readiness through...
jobs.frontdoordefense.com - Jobboard
Seattle, WA
2 days ago
AI Ops TPM: Lead Cross-Functional Tech Programs
$123.1k - $186.3k
Salesforce is looking for a Technical Program Manager in Seattle to lead and manage complex technical initiatives within our Technology & Products organizations. The candidate will work with engineering, product management, and sales to ensure timely project delivery....
Salesforce
Seattle, WA
4 days ago
Senior TPM Lead: Generative AI & Cross-Functional Programs
Google is seeking a Technical Program Manager Lead for Workspace in Kirkland, WA. You will lead and direct program management for a large engineering organization focused on Generative AI programs, influencing cross-functional teams and driving strategic initiatives. The...
Google
Kirkland, WA
3 days ago
Senior IT Program Lead: Data Governance & AI
The Boeing Company is seeking a Senior Information Technology Program Management Specialist for our team in Seattle or Everett, WA. This... ...IT project plans, working collaboratively across teams, and leading efforts to enhance data integrity across Boeing’s commercial operations...
The Boeing Company
Seattle, WA
1 day ago
Senior Program Lead - Finance, Analytics & AI
brobstongroup.com - Jobboard is looking for a Senior Manager, Program Management for Finance Insights, Analytics & AI to lead a range of enterprise programs focused on finance automation and analytics modernization. This role involves program leadership, stakeholder management...
brobstongroup.com - Jobboard
Seattle, WA
1 day ago
Partner Programs Lead: AI-Driven Operations & Analytics
DocuSign is looking for a Partner Operations Program Manager based in Seattle, WA. The role focuses on managing the operational strategy... ...for partner programs and ensuring the effectiveness of Data and AI capabilities across partnerships. Candidates should have over eight...
Work at office
DocuSign
Seattle, WA
1 day ago
Senior Quant Analytics Lead — Remote Model Risk & Validation
Affirm is seeking a professional for their Model Risk Management team. In this role, you will challenge and validate machine learning... ...experience in model validation, strong technical skills in programming languages like Python, and knowledge in machine learning frameworks...
Remote job
Flexible hours
Affirm
Seattle, WA
3 days ago
Lead Technical Program Manager
$148k - $186k
...the dreamers and builders in the world. We are looking for a Lead Technical Program Manager to lead the data center build‑outs for our high‑... ...You’ll join a fast‑paced team dedicated to scaling our Agentic AI Cloud, making sure our physical hardware is deployed quickly...
Remote work
DigitalOcean
Seattle, WA
2 days ago
Healthcare AI Model Validator — Validate & Improve Models
VigorCare Pediatric Services is looking for a skilled AI Model Validator in Seattle, Washington. You will be responsible for validating... ...hold a Bachelor's degree and possess strong analytical and programming skills. If you're detail-oriented and can manage multiple projects...
VigorCare Pediatric Services
Seattle, WA
1 day ago
PhD Research Intern - AI Foundation Model Infra
...is seeking a Student Researcher in Seattle to conduct research on infrastructure for AI foundation models. This role requires pursuing a PhD in computer science and strong programming skills, focusing on efficiency and reliability in large-scale systems. Interns enjoy...
Internship
Pangleglobal
Seattle, WA
5 days ago
AI Model Validator
We are seeking a highly skilled AI Model Validator to join our team in Seattle, Washington. As an AI Model Validator, you will be responsible... ...learning algorithms and AI technology Experience with programming languages such as Python, R, or Java Familiarity with...
VigorCare Pediatric Services
Seattle, WA
1 day ago
CME Programs Lead: Education & Accreditation
CommonSpirit Health is hiring a Manager for Continuing Medical Education to develop and evaluate educational programs for medical staff. This role involves ensuring compliance with accreditation standards and assessing educational needs. The ideal candidate will possess...
CommonSpirit Health
Seattle, WA
3 days ago
Enterprise Inventory Programs Leader, AI-Driven Controls
brobstongroup.com - Jobboard is looking for a Program Manager to lead enterprise programs aimed at improving inventory visibility and controls. The ideal candidate will own complex programs, translating strategic objectives into executable plans while ensuring stakeholder...
brobstongroup.com - Jobboard
Seattle, WA
5 days ago
Senior Program Leader: AI-Driven Cross-Functional Impact
$96k - $200k
Indeed is seeking an experienced professional to manage complex, cross-functional programs in Seattle, Washington. This role demands strong program leadership, risk management, and AI integration skills, aimed at enhancing operational excellence. Ideal candidates will have...
Indeed
Seattle, WA
3 days ago
Strategic Patient Experience Program Lead
...Veterans Health Administration is seeking a Supervisory Program Specialist (Patient Experience Officer) to lead initiatives at the VA Puget Sound Healthcare... ...WA. The incumbent will focus on healthcare program evaluation and organizational change management, ensuring...
Veterans Health Administration
Seattle, WA
3 days ago
Portfolio and Platform Lead, MNCNH Artificial Intelligence (*3-Year LTE)
$233.6k - $362.2k
...and increasingly through AI and connected device... ...equal chance to thrive and lead, everyone benefits. We... ..., and child health programs. This individual will bridge... ...Standard approach for evaluating AI projects, confirming... ...whether capable models translate into real‑world...
H1b
Local area
Relocation
SwiftCruit
Seattle, WA
4 days ago
Remote Senior AI Risk & Governance Lead
$163.2k - $220.8k
...is looking for a Senior AI Risk Advisor to join... ...how one of the world’s leading law firms harnesses AI... ...excited to build governance programs that are both rigorous... ...function end-to-end: evaluating the latest AI tools and... ...assessments across the full model lifecycle — evaluating...
Remote job
Work experience placement
Worldwide
Shift work
Wilson-Sonsini-Goodrich-
Seattle, WA
2 days ago
Gallery Learning Lead: Docent Programs & Education
...Seattle Art Museum is looking for a Manager of Gallery Learning to oversee the docent program and enhance education initiatives. This role includes recruiting, training, and evaluating docents, as well as creating educational resources for school and public tours. The...
Seattle Art Museum
Seattle, WA
2 days ago
Member of Technical Staff - Imagine Model
$180k
...xAI xAI’s mission is to create AI systems that can accurately... ...multimodal engineer on the Imagine Model Team, you will develop cutting... ...visual and audio data. Design evaluation frameworks, metrics,... ...Qualifications Track record in leading studies that significantly improve...
Temporary work
xAI
Seattle, WA
4 days ago
Student Researcher (AI Foundation Model Infrastructure - Seed) - 2027 Start (PhD)
$57 per hour
...inference, and heterogeneous hardware compilation technologies for AI foundation models. Conduct research on infrastructure and systems for large‑... ..., mathematics, engineering, or a related field. Strong programming skills and solid foundation in algorithms, data structures,...
Hourly pay
Internship
Local area
ByteDance
Seattle, WA
4 days ago
Lead Compensation Program Manager - Market Strategy & Infrastructure
$116k - $174k
Remote, USA Lead Compensation Program Manager - Market Strategy & Infrastructure Location: Remote, USA... ...new and midstream projects, and build AI‑leveraged tools necessary to scale our... ...structures, utilizing sophisticated modeling to ensure ranges are fiscally sustainable...
Work at office
Immediate start
Remote work
Worldwide
Relocation package
Shift work
Unity Technologies
Bellevue, WA
4 days ago
Lead Technical Analyst, Workspace AI, Trust and Safety
$188k - $275k
Lead Technical Analyst, Workspace AI, Trust and Safety Google Seattle, WA, USA Benefits... ...in one or more programming languages (e.g., Python,... ...Anomaly Detection, or AI models. Preferred Qualifications... ...safety, prompt injection evaluations, and misuse prevention across...
Temporary work
Work experience placement
Google Inc.
Seattle, WA
4 days ago
Remote AI Model Trainer (Freelance)
$20 per hour
A growing tech company is seeking a Freelance Contractor to evaluate and improve AI chatbots. This position allows for flexible hours and project selection, with competitive pay starting at $20 per hour. Applicants should have strong English skills and an eye for detail...
Remote job
Hourly pay
For contractors
Freelance
Flexible hours
DataAnnotation
Seattle, WA
1 day ago

Do you want to receive more vacancies?

Subscribe and receive similar vacancies to AI Model Evaluation Program Lead. Be the first to apply!