Software Engineer - AI Evaluation
$60 - $100 per hourMercor
About the job
Mercor connects elite creative and technical talent with leading AI research labs. Headquartered in San Francisco, our investors include Benchmark , General Catalyst , Peter Thiel , Adam D'Angelo , Larry Summers , and Jack Dorsey .
Position: Software Engineering, Data Science, and Systems Design Experts
Type: Contract
Compensation: $60–$100/hour
Location: Remote
Role Responsibilities
- Evaluate LLM-generated responses to coding and software engineering queries for accuracy, reasoning, clarity, and completeness.
- Conduct fact-checking using trusted public sources and authoritative references.
- Conduct accuracy testing by executing code and validating outputs using appropriate tools .
- Annotate model responses by identifying strengths, areas of improvement, and factual or conceptual inaccuracies.
- Assess code quality, readability, algorithmic soundness, and explanation quality.
- Ensure model responses align with expected conversational behavior and system guidelines.
Qualifications
Must-Have
- BS, MS, or PhD in Computer Science or a closely related field .
- Significant (3+ years) real-world experience in software engineering or related technical roles.
- Expert in at least two relevant programming languages (e.g., Python, Java, C++, C, JavaScript, Go, Rust, Ruby, SQL, Powershell, Bash, Swift, Kotlin, R, TypeScript, HTML/CSS ).
- Able to solve HackerRank or LeetCode Medium and Hard–level problems independently .
- Experience contributing to well-known open-source projects, including merged pull requests.
- Significant experience using LLMs while coding and understanding their strengths and failure modes.
- Strong attention to detail and comfortable evaluating complex technical reasoning , identifying subtle bugs or logical flaws.
Preferred
- Prior experience with RLHF , model evaluation, or data annotation work.
- Track record in competitive programming.
- Experience reviewing code in production environments.
- Familiarity with multiple programming paradigms or ecosystems.
- Experience explaining complex technical concepts to non-expert audiences.
Application Process (Takes 20–30 mins to complete)
- Upload resume
- AI interview based on your resume
- Submit form
Resources & Support
- For details about the interview process and platform information, please check:
- For any help or support, reach out to: View email address on jobs.jobcopilot.com
PS: Our team reviews applications daily. Please complete your AI interview and application steps to be considered for this opportunity.
$50 - $150 per hour
A leading AI company is seeking a software engineer to review and evaluate model-generated code. This contract role requires several years of software engineering experience, particularly as a full-stack engineer at notable tech firms. You will assess code quality and...SuggestedHourly payContract workFlexible hours- ...mission is to organize human intelligence to power the AI economy. We partner with leading AI labs and enterprises... ..., or London offices. About the Role As a Senior Software Engineer (AI Data & Evaluation) at Mercor, you will be at the core of building the data...SuggestedWork at officeRelocation package
$175k - $215k
...state-of-the-art Generative AI to create a training ground for... ...Waymo Driver. The Simulator Evaluation team faces the ultimate data... ...We are looking for aSoftware Engineer to build the metrics and pipelines... ...will report to Senior Staff Software Engineering Manager and serve...SuggestedFull timeRemote work$170k - $216k
...Software Engineer, Perception Evaluation and Test Automation Waymo is an autonomous driving technology company with the mission to be the world's most... ...camera, or Radar) ~2+ years of experience in industrial AI applications involving the creation, maintenance, and...SuggestedFull timeRemote work$204k - $259k
...dynamics, and state-of-the-art Generative AI to create a training ground for the Waymo Driver. The Simulator Evaluation team faces the ultimate data challenge: How... ...is "real"? We are looking for aSenior Software Engineer to build the metrics and systems that grade...SuggestedFull timeRemote work$181.1k - $318.4k
...AIML - Sr. Software Development Engineer, Evaluation At Apple, we create world-class innovative products that seamlessly combine cutting-edge hardware... ...critical to the development and optimization of Apple's AI/ML features. Responsibilities: ~ As a Senior Backend...Immediate startRelocation- ...Software Engineer, Agent Evaluation and Quality Engineering · Full-time · San Francisco; New York Our mission is to automate coding. The first... ...You'll Work On Designing and building best-in-class AI evaluation system: curated datasets, offline replay, scorers...Full timeWork at office
- Software Engineer (Model Evaluation & Benchmarking) About the Role We are hiring Engineers focused on AI Model Evaluation to build the systems that ensure multimodal AI behaves reliably, consistently, and predictably as it moves from research into production. This position...
$60.1 per hour
...billions in simulation across 15+ U.S. states. Waymo's Release Evaluation team ensures that each version of the Waymo Driver is safe... ...detect issues in the onboard perception system Apply generative AI models (VLMs) to develop features providing information about offboard...Hourly payFull timeInternshipSummer internship- ...the Team We build the data, evaluation, and experimentation... ...powering next‑generation agentic AI systems . Our work directly... ...top‑tier startups, and elite engineering orgs . Revenue is already in... ...~1–3 years as a full‑stack software engineer ~ Background at a...Remote workFlexible hours
$127k - $223k
...Description Waabi, founded by AI visionary Raquel Urtasun, is... .... To learn more visit: The Evaluation Algorithms team is responsible... ...realistic closed-loop simulation engine built with the latest in... ...Python programming and strong software engineering fundamentals with...Full timeWork at officeWork from homeFlexible hours- ...Obsidian is seeking a Cloud Storage Management Expert to enhance AI-powered storage management through human-in-the-loop testing. The ideal candidate will evaluate user journeys of AI agents across storage management workflows, incorporating their storage administration...
- Obsidian is seeking a Cloud Storage Management Expert to perform manual evaluations of AI-powered storage management solutions. This W-2 position involves testing, validating, and providing expert insights into user journeys regarding data migrations across various platforms...
- Obsidian is seeking a Cloud Storage Management Expert for evaluating AI storage management solutions. This role involves testing agent responses, assessing migration strategies and generating evaluation reports. The ideal candidate should have extensive storage administration...
$192k - $237.1k
A leading compliance software company in San Francisco is seeking an Applied AI Engineer to innovate compliance automation through applied research and evaluation. This role emphasizes experimentation over production engineering, requiring strong skills in information retrieval...- Drata is seeking a Senior Applied Research Engineer to enhance the quality of AI systems through rigorous evaluation and experimentation. This role emphasizes applied research, focusing on information retrieval and reasoning strategies. The ideal candidate will bring 5+...
$150k - $180k
...AI Evaluations Engineer – HealthcareLocation: Remote, located in the USType: Full-timeDepartment: EngineeringReports to: Director Of EngineeringResponsibilitiesBuild... ...and maintainability.Qualifications5+ years of professional software engineering experience, with a strong focus on building...Remote workFlexible hours$150k
Tzafon is seeking a skilled engineer to enhance their machine intelligence systems in San Francisco. As part of the team, you'll be responsible for building evaluation infrastructure, designing data pipelines, and implementing fine-tuning processes. Ideal candidates have...$150k - $250k
...Distyl AI Job Posting Distyl is an applied AI technology... ...Distyl, we build AI systems using Evaluation-Driven Development —an... ...production. AI Evaluation Engineers focus on designing and implementing... ...We Require ~2+ years of software engineering experience ~...Work at office3 days per week$235.03k - $352.29k
...scalable driver, combining cutting-edge AI with automotive-grade hardware. Nuro licenses... ...role, you will lead the development of evaluation tooling that ensures our technology... ...functionally with Autonomy and Infrastructure engineers to set a roadmap that unifies evaluation...- Ironclad Inc. is seeking an AI Evaluation Engineer to enhance contract management through AI. Located in San Francisco, the role involves analyzing datasets, designing feedback loops, and ensuring continuous improvement of ML systems. Ideal candidates will have a quantitative...Contract workFlexible hours
- Ironclad, located in San Francisco, is seeking an AI Evaluation Engineer to join their team. This role involves analyzing datasets, designing feedback loops, and partnering closely with AI Engineers to improve model quality. Applicants should have 8+ years of experience...Contract work
- A pioneering AI technology firm based in San Francisco is seeking an AI Engineer to own the evaluation infrastructure for AI agents. This role requires designing automated pipelines... ...candidates have experience in production software and familiarity with TypeScript, React,...Remote jobFlexible hours
- A cutting-edge AI firm in San Francisco is seeking a Research Engineer to develop evaluation systems and benchmarking pipelines for language models. Candidates should have a strong background in applied research, coding skills, and familiarity with ML models. You will work...
$172.5k - $260.1k
...not duplicating efforts. Job Category Software Engineering Job Details About Salesforce Salesforce is the #1 AI CRM, where humans with agents drive customer... ...AI) tools to help our recruiters assess and evaluate candidates' resumes and qualifications...Permanent employment$120k - $170k
...adventure? Loft Orbital is looking for a Software Engineer to join our Ground Software Solutions... ...this role is intentionally wide as we evaluate individuals based on their unique experience... ...observation, IoT connectivity, on-orbit AI, national security missions, and more....Temporary workWork at officeRelocation packageFlexible hours- ...skilled professional in San Francisco for a role focused on ensuring the accuracy and reliability of Veeva AI Agents. The position involves defining evaluation strategies, assessing LLM outputs, and creating high-quality datasets through rigorous validation methodologies...Flexible hours
$105k - $125k
...and threat-intelligence layer trusted by frontier AI labs, AI unicorns, Fortune 10 companies, and leading... ...technology platforms. Our adversarial red teaming, model evaluations, and intelligence collection enable engineering, safety, and security teams to stay ahead of...Remote work- B Capital seeks a talented individual for an AI Evaluation role in San Francisco. This position involves conducting critical comparative analysis, refining evaluation systems, and collaborating with various teams to enhance model capabilities. The ideal candidate will have...
- ...located in San Francisco is seeking an innovative Quality Engineer for their AI products. This role blends ops, strategy, and analytics to... ...leading labs, and ensure user satisfaction through effective evaluation baselines. Competitive salary and benefits offered, with a...
Do you want to receive more vacancies?
Subscribe and receive similar vacancies to Software Engineer - AI Evaluation. Be the first to apply!
- software engineer full time San Francisco, CA
- facebook software engineer San Francisco, CA
- startup software engineer San Francisco, CA
- intermediate software engineer San Francisco, CA
- research software engineer San Francisco, CA
- software developer no experience San Francisco, CA
- rust software engineer San Francisco, CA
- freelance software developer San Francisco, CA
- work from home software developer San Francisco, CA
- software developer San Francisco, CA



