Software Engineer - AI Evaluation

$60 - $100 per hour

Full-time

Mercor

About the job

Mercor connects elite creative and technical talent with leading AI research labs. Headquartered in San Francisco, our investors include Benchmark , General Catalyst , Peter Thiel , Adam D'Angelo , Larry Summers , and Jack Dorsey .

Position: Software Engineering, Data Science, and Systems Design Experts

Type: Contract

Compensation: $60–$100/hour

Location: Remote

Role Responsibilities

Evaluate LLM-generated responses to coding and software engineering queries for accuracy, reasoning, clarity, and completeness.
Conduct fact-checking using trusted public sources and authoritative references.
Conduct accuracy testing by executing code and validating outputs using appropriate tools .
Annotate model responses by identifying strengths, areas of improvement, and factual or conceptual inaccuracies.
Assess code quality, readability, algorithmic soundness, and explanation quality.
Ensure model responses align with expected conversational behavior and system guidelines.

Qualifications

Must-Have

BS, MS, or PhD in Computer Science or a closely related field .
Significant (3+ years) real-world experience in software engineering or related technical roles.
Expert in at least two relevant programming languages (e.g., Python, Java, C++, C, JavaScript, Go, Rust, Ruby, SQL, Powershell, Bash, Swift, Kotlin, R, TypeScript, HTML/CSS ).
Able to solve HackerRank or LeetCode Medium and Hard–level problems independently .
Experience contributing to well-known open-source projects, including merged pull requests.
Significant experience using LLMs while coding and understanding their strengths and failure modes.
Strong attention to detail and comfortable evaluating complex technical reasoning , identifying subtle bugs or logical flaws.

Preferred

Prior experience with RLHF , model evaluation, or data annotation work.
Track record in competitive programming.
Experience reviewing code in production environments.
Familiarity with multiple programming paradigms or ecosystems.
Experience explaining complex technical concepts to non-expert audiences.

Application Process (Takes 20–30 mins to complete)

Upload resume
AI interview based on your resume
Submit form

Resources & Support

For details about the interview process and platform information, please check:
For any help or support, reach out to: View email address on jobs.jobcopilot.com

PS: Our team reviews applications daily. Please complete your AI interview and application steps to be considered for this opportunity.

Apply

Vacancy posted 12 hours ago

Similar jobs that could be interesting for youBased on the Software Engineer - AI Evaluation in San Francisco, CA vacancy

Contract Senior Software Engineer - AI Code Review & Evaluation
$50 - $150 per hour
A leading AI company is seeking a software engineer to review and evaluate model-generated code. This contract role requires several years of software engineering experience, particularly as a full-stack engineer at notable tech firms. You will assess code quality and...
Suggested
Hourly pay
Contract work
Flexible hours
Turing
San Francisco, CA
4 days ago
Software Engineer, AI Data & Evaluation
...mission is to organize human intelligence to power the AI economy. We partner with leading AI labs and enterprises... ..., or London offices. About the Role As a Senior Software Engineer (AI Data & Evaluation) at Mercor, you will be at the core of building the data...
Suggested
Work at office
Relocation package
Mercor Alabaster
San Francisco, CA
12 hours ago
Software Engineer, Simulator Evaluation
$175k - $215k
...state-of-the-art Generative AI to create a training ground for... ...Waymo Driver. The Simulator Evaluation team faces the ultimate data... ...We are looking for aSoftware Engineer to build the metrics and pipelines... ...will report to Senior Staff Software Engineering Manager and serve...
Suggested
Full time
Remote work
Waymo
San Francisco, CA
2 days ago
Software Engineering - Automation
$170k - $216k
...Software Engineer, Perception Evaluation and Test Automation Waymo is an autonomous driving technology company with the mission to be the world's most... ...camera, or Radar) ~2+ years of experience in industrial AI applications involving the creation, maintenance, and...
Suggested
Full time
Remote work
Waymo
San Francisco, CA
4 days ago
Senior Software Engineer, Simulator Evaluation
$204k - $259k
...dynamics, and state-of-the-art Generative AI to create a training ground for the Waymo Driver. The Simulator Evaluation team faces the ultimate data challenge: How... ...is "real"? We are looking for aSenior Software Engineer to build the metrics and systems that grade...
Suggested
Full time
Remote work
Waymo
San Francisco, CA
2 days ago
AIML - Sr. Software Development Engineer, Evaluation
$181.1k - $318.4k
...AIML - Sr. Software Development Engineer, Evaluation At Apple, we create world-class innovative products that seamlessly combine cutting-edge hardware... ...critical to the development and optimization of Apple's AI/ML features. Responsibilities: ~ As a Senior Backend...
Immediate start
Relocation
Apple
San Francisco, CA
4 days ago
Software Engineer, Agent Evaluation and Quality
...Software Engineer, Agent Evaluation and Quality Engineering · Full-time · San Francisco; New York Our mission is to automate coding. The first... ...You'll Work On Designing and building best-in-class AI evaluation system: curated datasets, offline replay, scorers...
Full time
Work at office
Anysphere
San Francisco, CA
4 days ago
Software Engineer (Model Evaluation & Benchmarking)
Software Engineer (Model Evaluation & Benchmarking) About the Role We are hiring Engineers focused on AI Model Evaluation to build the systems that ensure multimodal AI behaves reliably, consistently, and predictably as it moves from research into production. This position...
SpreeAI
San Francisco, CA
2 days ago
2026 Summer Intern, PhD, Software Engineer, Release Evaluation
$60.1 per hour
...billions in simulation across 15+ U.S. states. Waymo's Release Evaluation team ensures that each version of the Waymo Driver is safe... ...detect issues in the onboard perception system Apply generative AI models (VLMs) to develop features providing information about offboard...
Hourly pay
Full time
Internship
Summer internship
Waymo
San Francisco, CA
7 hours ago
Software Engineer (Full‑Stack / Infrastructure) -- Frontier AI Evaluation
...the Team We build the data, evaluation, and experimentation... ...powering next‑generation agentic AI systems . Our work directly... ...top‑tier startups, and elite engineering orgs . Revenue is already in... ...~1–3 years as a full‑stack software engineer ~ Background at a...
Remote work
Flexible hours
Emeraldadvantageconcepts
San Francisco, CA
27 days ago
Software Engineer, Evaluation Infrastructure
$127k - $223k
...Description Waabi, founded by AI visionary Raquel Urtasun, is... .... To learn more visit: The Evaluation Algorithms team is responsible... ...realistic closed-loop simulation engine built with the latest in... ...Python programming and strong software engineering fundamentals with...
Full time
Work at office
Work from home
Flexible hours
Waabi
San Francisco, CA
11 days ago
AI-Driven Cloud Storage Evaluation Specialist
...Obsidian is seeking a Cloud Storage Management Expert to enhance AI-powered storage management through human-in-the-loop testing. The ideal candidate will evaluate user journeys of AI agents across storage management workflows, incorporating their storage administration...
Obsidian
San Francisco, CA
2 days ago
Cloud Storage Evaluation Specialist - AI & Migration
Obsidian is seeking a Cloud Storage Management Expert to perform manual evaluations of AI-powered storage management solutions. This W-2 position involves testing, validating, and providing expert insights into user journeys regarding data migrations across various platforms...
Obsidian
San Francisco, CA
12 hours ago
Cloud Storage Evaluation Specialist — AI Agent Testing
Obsidian is seeking a Cloud Storage Management Expert for evaluating AI storage management solutions. This role involves testing agent responses, assessing migration strategies and generating evaluation reports. The ideal candidate should have extensive storage administration...
Obsidian
San Francisco, CA
12 hours ago
Applied AI Research Engineer — RAG & Evaluation
$192k - $237.1k
A leading compliance software company in San Francisco is seeking an Applied AI Engineer to innovate compliance automation through applied research and evaluation. This role emphasizes experimentation over production engineering, requiring strong skills in information retrieval...
Drata
San Francisco, CA
3 days ago
Senior AI Research Engineer - RAG & GenAI Evaluation
Drata is seeking a Senior Applied Research Engineer to enhance the quality of AI systems through rigorous evaluation and experimentation. This role emphasizes applied research, focusing on information retrieval and reasoning strategies. The ideal candidate will bring 5+...
jobr.pro
San Francisco, CA
12 hours ago
AI Evaluations Engineer - Healthcare
$150k - $180k
...AI Evaluations Engineer – HealthcareLocation: Remote, located in the USType: Full-timeDepartment: EngineeringReports to: Director Of EngineeringResponsibilitiesBuild... ...and maintainability.Qualifications5+ years of professional software engineering experience, with a strong focus on building...
Remote work
Flexible hours
Ellipsis Health
San Francisco, CA
2 days ago
Applied AI Systems Engineer - ML Infra & Evaluation
$150k
Tzafon is seeking a skilled engineer to enhance their machine intelligence systems in San Francisco. As part of the team, you'll be responsible for building evaluation infrastructure, designing data pipelines, and implementing fine-tuning processes. Ideal candidates have...
Tzafon
San Francisco, CA
2 days ago
AI Engineer, Evaluation
$150k - $250k
...Distyl AI Job Posting Distyl is an applied AI technology... ...Distyl, we build AI systems using Evaluation-Driven Development —an... ...production. AI Evaluation Engineers focus on designing and implementing... ...We Require ~2+ years of software engineering experience ~...
Work at office
3 days per week
Distyl AI
San Francisco, CA
4 days ago
Technical Lead Manager, Autonomy Evaluation and Intelligence
$235.03k - $352.29k
...scalable driver, combining cutting-edge AI with automotive-grade hardware. Nuro licenses... ...role, you will lead the development of evaluation tooling that ensures our technology... ...functionally with Autonomy and Infrastructure engineers to set a roadmap that unifies evaluation...
Nuro
San Francisco, CA
14 days ago
AI Evaluation Engineer: Data‑Driven NLP for Contracts
Ironclad Inc. is seeking an AI Evaluation Engineer to enhance contract management through AI. Located in San Francisco, the role involves analyzing datasets, designing feedback loops, and ensuring continuous improvement of ML systems. Ideal candidates will have a quantitative...
Contract work
Flexible hours
Ironclad Inc.
San Francisco, CA
1 day ago
AI Evaluation Engineer — Data-Driven Contract Intelligence
Ironclad, located in San Francisco, is seeking an AI Evaluation Engineer to join their team. This role involves analyzing datasets, designing feedback loops, and partnering closely with AI Engineers to improve model quality. Applicants should have 8+ years of experience...
Contract work
Ironclad
San Francisco, CA
12 hours ago
Remote AI Engineer, Quality & Evaluation at Enterprise Scale
A pioneering AI technology firm based in San Francisco is seeking an AI Engineer to own the evaluation infrastructure for AI agents. This role requires designing automated pipelines... ...candidates have experience in production software and familiarity with TypeScript, React,...
Remote job
Flexible hours
Fieldguide
San Francisco, CA
12 hours ago
AI Benchmarking Engineer — Evaluations & Failure Analysis
A cutting-edge AI firm in San Francisco is seeking a Research Engineer to develop evaluation systems and benchmarking pipelines for language models. Candidates should have a strong background in applied research, coding skills, and familiarity with ML models. You will work...
Mercor
San Francisco, CA
2 days ago
Sr. Security Software Engineer, Vulnerability Management - Slack
$172.5k - $260.1k
...not duplicating efforts. Job Category Software Engineering Job Details About Salesforce Salesforce is the #1 AI CRM, where humans with agents drive customer... ...AI) tools to help our recruiters assess and evaluate candidates' resumes and qualifications...
Permanent employment
Salesforce.Com Inc
San Francisco, CA
3 days ago
Ground Software Solutions Engineer
$120k - $170k
...adventure? Loft Orbital is looking for a Software Engineer to join our Ground Software Solutions... ...this role is intentionally wide as we evaluate individuals based on their unique experience... ...observation, IoT connectivity, on-orbit AI, national security missions, and more....
Temporary work
Work at office
Relocation package
Flexible hours
Loft Orbital
San Francisco, CA
1 day ago
Senior AI Data Engineer: Evaluation & Validation Lead
...skilled professional in San Francisco for a role focused on ensuring the accuracy and reliability of Veeva AI Agents. The position involves defining evaluation strategies, assessing LLM outputs, and creating high-quality datasets through rigorous validation methodologies...
Flexible hours
Veeva Systems, Inc.
San Francisco, CA
4 days ago
Software Engineer, Backend
$105k - $125k
...and threat-intelligence layer trusted by frontier AI labs, AI unicorns, Fortune 10 companies, and leading... ...technology platforms. Our adversarial red teaming, model evaluations, and intelligence collection enable engineering, safety, and security teams to stay ahead of...
Remote work
10a Labs
San Francisco, CA
5 days ago
Staff AI Evaluations Engineer — Open Foundation Models
B Capital seeks a talented individual for an AI Evaluation role in San Francisco. This position involves conducting critical comparative analysis, refining evaluation systems, and collaborating with various teams to enhance model capabilities. The ideal candidate will have...
B Capital
San Francisco, CA
4 days ago
AI Model Behavior Engineer—Quality & Evaluation
...located in San Francisco is seeking an innovative Quality Engineer for their AI products. This role blends ops, strategy, and analytics to... ...leading labs, and ensure user satisfaction through effective evaluation baselines. Competitive salary and benefits offered, with a...
Notion
San Francisco, CA
4 days ago

Do you want to receive more vacancies?

Subscribe and receive similar vacancies to Software Engineer - AI Evaluation. Be the first to apply!