AI Model Evaluation Program Lead
$300k - $320kAnthropic
About the role: We are seeking a Technical Program Manager to lead our AI model evaluation initiatives across multiple workstreams. This role will be crucial in assessing the performance, capabilities, limitations, and potential risks of our AI models. Working closely with our Research, Trust & Safety, Frontier Redteaming, and Policy teams, you will drive high-priority evaluation projects to build new processes, align metrics with policy, and track measurable progress. You will help build and adapt the model evaluation program to ensure model deployments are rigorous and aligned with our commitment to responsible AI development. The ideal candidate will have a strong technical background and experience managing cross-functional programs in AI development, ML engineering, or related fields. You’ll be joining a team of Technical Program Managers who own and drive cross-functional programs that align to the company’s top priorities. In this role, you’ll have the opportunity to make a foundational impact as you contribute the scaling of a centralized TPM function for the company. Extremely strong soft skills are paramount, as our team is front and center in driving lots of company-wide changes and top priority initiatives that require generating buy-in, balancing various opinions, and competing for attention in our rapidly scaling environment. This role is a great fit for someone who has both seen excellence at scale and operated in rapidly scaling, high-ambiguity teams and scope. We are seeking candidates with deep TPM expertise but who are comfortable acting as adaptable generalists who add value fast. We excel at maintaining a broad view of our work but diving deep into the details when necessary. We understand business goals, translate and organize them into technical programs and projects, and drive execution. We are adept at engaging with both non-technical and technical stakeholders at all levels of the company, including executive leadership. In this role, you will have the opportunity to shape the development of advanced AI systems and contribute to Anthropic's mission of ensuring that AI benefits all of humanity. If you are passionate about responsible AI development, have a strong technical background, and thrive in a fast-paced, collaborative environment, we'd love to hear from you. Responsibilities: Partner with teams like Frontier Risk Evaluations, Security, and Trust & Safety to develop and implement comprehensive evaluation protocols for our latest frontier AI models Build a single source of truth for tracking all types of model evaluations as required by our Responsible Scaling Policy, AI safety institutes, the White House, and others Develop and maintain procedures for conducting evaluations, including designing test suites, coordinating red team exercises, and analyzing results Create and manage dashboards and reporting systems to track model performance, safety metrics, and evaluation outcomes across different AI systems and versions Lead cross-functional workshops to identify potential risks and edge cases for evaluation, ensuring thorough coverage of AI capabilities and limitations Coordinate with external partners and industry standards bodies to align our evaluation practices with emerging best practices in responsible AI development Provide detailed status reports, identifying technical risks, dependencies, and areas requiring additional support Facilitate communication and coordination between technical workstreams and stakeholders Continuously identify opportunities for technical process improvements and implement changes as needed Stay up-to-date with the latest developments in AI safety, ML engineering, and related fields to ensure the program remains at the forefront of responsible AI development You might be a good fit if you: Have several years of experience in technical program management, with a track record of successfully delivering complex technical programs, preferably in AI development, ML engineering, or related fields Have experience executing technical programs that require systems and engineering-level knowledge. Have exceptionally strong interpersonal and communication skills that enable you to influence without authority, build cross-organizational support, cooperation and action around initiatives and process adoption. Have experience prompt engineering on language models Have experience designing and/or running evaluations on Large Language Models Have knowledge of emerging AI governance frameworks and best practices Have a high threshold for navigating ambiguity and are able to balance setting strategic priorities with rapid, high-quality execution. Thrive in unstructured environments, and have a knack for bringing order to chaos. The expected salary range for this position is: Annual Salary:
$300,000—$320,000 USD
Logistics Location-based hybrid policy: Currently, we expect all staff to be in one of our offices at least 25% of the time. However, some roles may require more time in our offices. US visa sponsorship: We do sponsor visas! However, we aren't able to successfully sponsor visas for every role and every candidate; operations roles are especially difficult to support. But if we make you an offer, we will make every effort to get you into the United States, and we retain an immigration lawyer to help with this. We encourage you to apply even if you do not believe you meet every single qualification. Not all strong candidates will meet every single qualification as listed. Research shows that people who identify as being from underrepresented groups are more prone to experiencing imposter syndrome and doubting the strength of their candidacy, so we urge you not to exclude yourself prematurely and to submit an application if you're interested in this work. We think AI systems like the ones we're building have enormous social and ethical implications. We think this makes representation even more important, and we strive to include a range of diverse perspectives on our team. Compensation and Benefits* Anthropic’s compensation package consists of three elements: salary, equity, and benefits. We are committed to pay fairness and aim for these three elements collectively to be highly competitive with market rates. Equity - For eligible roles, equity will be a major component of the total compensation. We aim to offer higher-than-average equity compensation for a company of our size, and communicate equity amounts at the time of offer issuance. US Benefits - The following benefits are for our US-based employees: Optional equity donation matching. Comprehensive health, dental, and vision insurance for you and all your dependents. 401(k) plan with 4% matching. 22 weeks of paid parental leave. Unlimited PTO – most staff take between 4-6 weeks each year, sometimes more! Stipends for education, home office improvements, commuting, and wellness. Fertility benefits via Carrot. Daily lunches and snacks in our office. Relocation support for those moving to the Bay Area. UK Benefits - The following benefits are for our UK-based employees: Optional equity donation matching. Private health, dental, and vision insurance for you and your dependents. Pension contribution (matching 4% of your salary). 21 weeks of paid parental leave. Unlimited PTO – most staff take between 4-6 weeks each year, sometimes more! Health cash plan. Life insurance and income protection. Daily lunches and snacks in our office. #J-18808-Ljbffr Anthropic- ...Physicist Expert Network to connect with leading AI labs and companies seeking your... ...our network contribute to Training and evaluating AI models in physics Creating tasks and deliverables... ...and data analysis, and scientific programming Strong communication skills Ability...SuggestedContract workRemote work
- ...Python Infrastructure Engineer - Model Evaluation (AI Training) About the Role What if... ...tooling, and evaluation systems that leading AI labs depend on to train and validate... ...stack developer with a strong systems programming background ~3-5+ years of professional...SuggestedHourly payOngoing contractContract workFreelanceRemote workFlexible hours
$72.8k - $130k
...Indians and Alaska Natives (AI/AN) statewide. The role is required... ...health education activities programs as they relate to the AI/AN... ...various services, financing models and other activities to... ...assigned region Perform post-event evaluations, summarize, and identify...SuggestedMinimum wageFull timeContract workWork experience placementLive inLocal areaRemote workMonday to Friday- ...looking for a Senior Data Architect to enhance AI systems' reasoning in data environments.... ...such as AWS and Azure. You will evaluate AI-generated content, ensuring it adheres... ...architectural standards, while also challenging AI models with realistic scenarios. A Bachelor's...SuggestedRemote work
$100 per hour
...A leading technology firm is seeking finance experts to enhance AI models. Responsibilities include evaluating performance in capital markets and creating assessment rubrics. Candidates should have 2+ years in finance fields like investment banking and possess strong...SuggestedHourly payRemote work10 hours per week$34 per hour
Welocalize is seeking a Data Quality Associate in Seattle to evaluate AI model outputs and provide structured feedback. This role involves high-level interaction with cutting-edge AI systems and requires a Bachelor’s degree along with strong professional writing skills....Full time- Welocalize is seeking a Data Quality Associate in Seattle to evaluate AI model outputs and provide structured feedback. You will work directly with advanced AI systems and contribute to improving evaluation frameworks. Candidates must possess a Bachelor's degree, be proficient...Full timeContract work
- AI Chopping Block, Inc. is seeking a Head of Pharmacovigilance and Safety responsible for managing pharmacovigilance programs within healthcare and life sciences. The role involves designing safety case databases, maintaining compliance with regulations, and embedding pharmacovigilance...
- Traveltechessentialist seeks a Senior Technical Program Manager - AI Servicing to lead innovative AI programs. This role requires 8+ years of experience in managing complex software projects and a strong background in AI/ML delivery. Located in Washington, you will collaborate...
$139.5k - $258.1k
...Machine Learning Research Engineer, Model Optimization & Algorithms... ...team brings innovative AI research into Apple products.... ...Experimental rigor when training/evaluating DNNs for the purpose of benchmarking... ...discretionary employee stock programs. Apple employees are eligible...Relocation- ...across healthcare networks and training programs, we don't just fill roles — we align talent with real-world operational needs. AI Workflow Reviewer – Front Office Coordinator... ...front office professionals to help evaluate and improve AI-driven patient interactions...Hourly payContract workTemporary workFor contractorsRemote work
$180k
...Of Technical Staff - Imagine Model Palo Alto, CA; Seattle, WA... ...XAI's mission is to create AI systems that can accurately understand... ...and audio data. Design evaluation frameworks, metrics, benchmarks... ...Qualifications Track record in leading studies that significantly...Temporary work- CommonSpirit Health is hiring a Manager for Continuing Medical Education to develop and evaluate educational programs for medical staff. This role involves ensuring compliance with accreditation standards and assessing educational needs. The ideal candidate will possess...
$170k - $230k
...seeking an Industrial Wastewater (IWW) Program Lead - Power Practice . The role requires at... ...procurement specifications, studies, bid evaluations, and reports. Lead the creation of... ...annually. Preferred Qualifications Process modeling experience. Existing relationships...Temporary workFor contractors$80k
...company relationships to align with AI - based solutions that enable... ...new opportunitiesManage leads, opportunities and pipeline in... ...Wellness Programs401(k) or RRSP programs with Company MatchPaid... ...department and applications are evaluated by its HR department through pre...Local area$163.2k - $220.8k
...is looking for a Senior AI Risk Advisor to join... ...how one of the world’s leading law firms harnesses AI... ...excited to build governance programs that are both rigorous... ...function end-to-end: evaluating the latest AI tools and... ...assessments across the full model lifecycle — evaluating...Remote jobWork experience placementWorldwideShift work$170k - $230k
...stewardship across industries, from up-front evaluation and planning through design and... ...searching for an Industrial Wastewater (IWW) Program Lead - Oil, Gas, and Chemicals Practice.... ...annually. Preferred Qualifications Process modeling experience. Existing relationships with...Full timeTemporary workPart timeFor contractors$166k - $220k
...technology, and business model of the 21st century’s... ...powered by Lattice OS, an AI-powered operating... ...and sites and systems evaluated and remediated on schedule... ...precision. WHAT YOU'LL DO Lead the company‑wide... ...7+ years of technical program or project management experience...Full timeWork experience placement- IMDiversity in Seattle is seeking a Senior Property Asset Management Program Specialist to join the team focused on ensuring customer... ...management. The role involves collaborating with team members to evaluate and implement strategies for securing property assets, while...
- Senior Joint Education Program Manager On Site | Offutt AFB, Bellevue, Nebraska | TS/SCI Required If you want to operate at the intersection... ...pedagogical methods to curriculum design and assessment. Evaluate program effectiveness and recommend improvements based on...Permanent employmentFlexible hours
$28 - $35 per hour
...of Labor's Homeless Veteran Reintegration Program (HVRP). With operations across multiple... ...placement support. Position Overview: The Lead Employment Specialist (LES) is the... ...and problem-solving skills Ability to evaluate and monitor performance data Effective...Hourly payWork at office- ...services company in the United States seeks a Fixed Income Research Analyst to aid in training AI models. The role requires expertise in financial reasoning and the ability to evaluate the outputs of AI chatbots. Offering flexible hours and competitive hourly rates, this...Hourly payContract workRemote workFlexible hours
- ...opportunity to contribute directly to how the next generation of AI systems reason about complex enterprise data environments.... ...professionals; in this role, you’ll challenge and evaluate advanced language models on Data Architect topics. Your work will help strengthen how...For contractorsRemote work
- ...contribute directly to how the next generation of AI systems reason about personal finance,... ...who can challenge advanced language models on job‑specific topics provided for this... ...explanations. Demonstrated experience in evaluating financial tradeoffs across risk, return,...Remote jobFor contractorsWork at office
$60 per hour
...Ltd is seeking Chemistry Experts and Chemical Engineers to join its Expert Network in Seattle. The role involves training and evaluating AI models using real-world chemical expertise, with compensation potentially up to $60 per hour. Candidates should have a BS, MS, or...Remote jobHourly pay$107.95k - $156.4k
...an exciting opportunity for an Auditor (Team Lead or Senior Team Lead) to join our Audit Leadership Development Program (ALDP). These positions will be located at one... ...businesses or corporate functions. You'll evaluate complex business issues, obtain ongoing training...Permanent employmentWork experience placementWork at officeRelocationVisa sponsorshipWork visaFlexible hoursShift workDay shift$151.05k - $273.6k
...focusing on three key areas: AI transformation, resource efficiency... ...Diagnosis & Metrics: Evaluate and analyze the current state... ...experience in software development, programming, front-end, or back-end... ...About TikTok TikTok is the leading destination for short-form mobile...Temporary workLocal areaShift work$186.8k - $299.2k
...curious, and impactdriven Lead Solutions Architect who thrives... ...at the intersection of AI, distributed systems, and largescale... ...pipeline , evaluating new technologies, model types, LLMbased features, and... ...contributing to internal learning programs to elevate Visa's AI...Work experience placementWork at officeLocal areaWorldwide- ...About mpathic.ai mpathic is keeping humans safe... ...across AI safety, evaluation, safety monitoring and... ...Commercial opportunities. Lead technical conversations... ..., including engagement model, tooling, and success... ...company match ~ Well-being programs ~ Flexible paid time...Full timeRemote workFlexible hours
- YO IT Consulting is seeking a Senior Civil Engineer to contribute to AI systems by evaluating and refining model performances in civil engineering topics. You should have over 4 years of experience in infrastructure design and a Bachelor's degree in Civil Engineering. The...Remote job
Do you want to receive more vacancies?
Subscribe and receive similar vacancies to AI Model Evaluation Program Lead. Be the first to apply!


