Sign up to access all features of our service.
  • Job search
  • Favorites
  • Create a CV
    New
  • Salaries
  • Subscriptions

Software Engineer - AI Evaluation

$60 - $100 per hour
Full-time

Mercor

About the job

Mercor connects elite creative and technical talent with leading AI research labs. Headquartered in San Francisco, our investors include Benchmark , General Catalyst , Peter Thiel , Adam D'Angelo , Larry Summers , and Jack Dorsey .

Position: Software Engineering, Data Science, and Systems Design Experts

Type: Contract

Compensation: $60–$100/hour

Location: Remote

Role Responsibilities

  • Evaluate LLM-generated responses to coding and software engineering queries for accuracy, reasoning, clarity, and completeness.
  • Conduct fact-checking using trusted public sources and authoritative references.
  • Conduct accuracy testing by executing code and validating outputs using appropriate tools .
  • Annotate model responses by identifying strengths, areas of improvement, and factual or conceptual inaccuracies.
  • Assess code quality, readability, algorithmic soundness, and explanation quality.
  • Ensure model responses align with expected conversational behavior and system guidelines.

Qualifications

Must-Have

  • BS, MS, or PhD in Computer Science or a closely related field .
  • Significant (3+ years) real-world experience in software engineering or related technical roles.
  • Expert in at least two relevant programming languages (e.g., Python, Java, C++, C, JavaScript, Go, Rust, Ruby, SQL, Powershell, Bash, Swift, Kotlin, R, TypeScript, HTML/CSS ).
  • Able to solve HackerRank or LeetCode Medium and Hard–level problems independently .
  • Experience contributing to well-known open-source projects, including merged pull requests.
  • Significant experience using LLMs while coding and understanding their strengths and failure modes.
  • Strong attention to detail and comfortable evaluating complex technical reasoning , identifying subtle bugs or logical flaws.

Preferred

  • Prior experience with RLHF , model evaluation, or data annotation work.
  • Track record in competitive programming.
  • Experience reviewing code in production environments.
  • Familiarity with multiple programming paradigms or ecosystems.
  • Experience explaining complex technical concepts to non-expert audiences.

Application Process (Takes 20–30 mins to complete)

  • Upload resume
  • AI interview based on your resume
  • Submit form

Resources & Support

PS: Our team reviews applications daily. Please complete your AI interview and application steps to be considered for this opportunity.

Vacancy posted 12 hours ago
Similar jobs that could be interesting for youBased on the Software Engineer - AI Evaluation in San Francisco, CA vacancy
  • $50 - $150 per hour

    A leading AI company is seeking a software engineer to review and evaluate model-generated code. This contract role requires several years of software engineering experience, particularly as a full-stack engineer at notable tech firms. You will assess code quality and... 
    Suggested
    Hourly pay
    Contract work
    Flexible hours

    Turing

    San Francisco, CA
    4 days ago
  •  ...mission is to organize human intelligence to power the AI economy. We partner with leading AI labs and enterprises...  ..., or London offices. About the Role As a Senior Software Engineer (AI Data & Evaluation) at Mercor, you will be at the core of building the data... 
    Suggested
    Work at office
    Relocation package

    Mercor Alabaster

    San Francisco, CA
    12 hours ago
  • $175k - $215k

     ...state-of-the-art Generative AI to create a training ground for...  ...Waymo Driver. The Simulator Evaluation team faces the ultimate data...  ...We are looking for aSoftware Engineer to build the metrics and pipelines...  ...will report to Senior Staff Software Engineering Manager and serve... 
    Suggested
    Full time
    Remote work

    Waymo

    San Francisco, CA
    2 days ago
  • $170k - $216k

     ...Software Engineer, Perception Evaluation and Test Automation Waymo is an autonomous driving technology company with the mission to be the world's most...  ...camera, or Radar) ~2+ years of experience in industrial AI applications involving the creation, maintenance, and... 
    Suggested
    Full time
    Remote work

    Waymo

    San Francisco, CA
    4 days ago
  • $204k - $259k

     ...dynamics, and state-of-the-art Generative AI to create a training ground for the Waymo Driver. The Simulator Evaluation team faces the ultimate data challenge: How...  ...is "real"? We are looking for aSenior Software Engineer to build the metrics and systems that grade... 
    Suggested
    Full time
    Remote work

    Waymo

    San Francisco, CA
    2 days ago
  • $181.1k - $318.4k

     ...AIML - Sr. Software Development Engineer, Evaluation At Apple, we create world-class innovative products that seamlessly combine cutting-edge hardware...  ...critical to the development and optimization of Apple's AI/ML features. Responsibilities: ~ As a Senior Backend... 
    Immediate start
    Relocation

    Apple

    San Francisco, CA
    4 days ago
  •  ...Software Engineer, Agent Evaluation and Quality Engineering · Full-time · San Francisco; New York Our mission is to automate coding. The first...  ...You'll Work On Designing and building best-in-class AI evaluation system: curated datasets, offline replay, scorers... 
    Full time
    Work at office

    Anysphere

    San Francisco, CA
    4 days ago
  • Software Engineer (Model Evaluation & Benchmarking) About the Role We are hiring Engineers focused on AI Model Evaluation to build the systems that ensure multimodal AI behaves reliably, consistently, and predictably as it moves from research into production. This position... 

    SpreeAI

    San Francisco, CA
    2 days ago
  • $60.1 per hour

     ...billions in simulation across 15+ U.S. states. Waymo's Release Evaluation team ensures that each version of the Waymo Driver is safe...  ...detect issues in the onboard perception system Apply generative AI models (VLMs) to develop features providing information about offboard... 
    Hourly pay
    Full time
    Internship
    Summer internship

    Waymo

    San Francisco, CA
    7 hours ago
  •  ...the Team We build the data, evaluation, and experimentation...  ...powering next‑generation agentic AI systems . Our work directly...  ...top‑tier startups, and elite engineering orgs . Revenue is already in...  ...~1–3 years as a full‑stack software engineer ~ Background at a... 
    Remote work
    Flexible hours

    Emeraldadvantageconcepts

    San Francisco, CA
    27 days ago
  • $127k - $223k

     ...Description Waabi, founded by AI visionary Raquel Urtasun, is...  .... To learn more visit: The Evaluation Algorithms team is responsible...  ...realistic closed-loop simulation engine built with the latest in...  ...Python programming and strong software engineering fundamentals with... 
    Full time
    Work at office
    Work from home
    Flexible hours

    Waabi

    San Francisco, CA
    11 days ago
  •  ...Obsidian is seeking a Cloud Storage Management Expert to enhance AI-powered storage management through human-in-the-loop testing. The ideal candidate will evaluate user journeys of AI agents across storage management workflows, incorporating their storage administration... 

    Obsidian

    San Francisco, CA
    2 days ago
  • Obsidian is seeking a Cloud Storage Management Expert to perform manual evaluations of AI-powered storage management solutions. This W-2 position involves testing, validating, and providing expert insights into user journeys regarding data migrations across various platforms... 

    Obsidian

    San Francisco, CA
    12 hours ago
  • Obsidian is seeking a Cloud Storage Management Expert for evaluating AI storage management solutions. This role involves testing agent responses, assessing migration strategies and generating evaluation reports. The ideal candidate should have extensive storage administration... 

    Obsidian

    San Francisco, CA
    12 hours ago
  • $192k - $237.1k

    A leading compliance software company in San Francisco is seeking an Applied AI Engineer to innovate compliance automation through applied research and evaluation. This role emphasizes experimentation over production engineering, requiring strong skills in information retrieval... 

    Drata

    San Francisco, CA
    3 days ago
  • Drata is seeking a Senior Applied Research Engineer to enhance the quality of AI systems through rigorous evaluation and experimentation. This role emphasizes applied research, focusing on information retrieval and reasoning strategies. The ideal candidate will bring 5+... 

    jobr.pro

    San Francisco, CA
    12 hours ago
  • $150k - $180k

     ...AI Evaluations Engineer – HealthcareLocation: Remote, located in the USType: Full-timeDepartment: EngineeringReports to: Director Of EngineeringResponsibilitiesBuild...  ...and maintainability.Qualifications5+ years of professional software engineering experience, with a strong focus on building... 
    Remote work
    Flexible hours

    Ellipsis Health

    San Francisco, CA
    2 days ago
  • $150k

    Tzafon is seeking a skilled engineer to enhance their machine intelligence systems in San Francisco. As part of the team, you'll be responsible for building evaluation infrastructure, designing data pipelines, and implementing fine-tuning processes. Ideal candidates have... 

    Tzafon

    San Francisco, CA
    2 days ago
  • $150k - $250k

     ...Distyl AI Job Posting Distyl is an applied AI technology...  ...Distyl, we build AI systems using Evaluation-Driven Development —an...  ...production. AI Evaluation Engineers focus on designing and implementing...  ...We Require ~2+ years of software engineering experience ~... 
    Work at office
    3 days per week

    Distyl AI

    San Francisco, CA
    4 days ago
  • $235.03k - $352.29k

     ...scalable driver, combining cutting-edge AI with automotive-grade hardware. Nuro licenses...  ...role, you will lead the development of evaluation tooling that ensures our technology...  ...functionally with Autonomy and Infrastructure engineers to set a roadmap that unifies evaluation... 

    Nuro

    San Francisco, CA
    14 days ago
  • Ironclad Inc. is seeking an AI Evaluation Engineer to enhance contract management through AI. Located in San Francisco, the role involves analyzing datasets, designing feedback loops, and ensuring continuous improvement of ML systems. Ideal candidates will have a quantitative... 
    Contract work
    Flexible hours

    Ironclad Inc.

    San Francisco, CA
    1 day ago
  • Ironclad, located in San Francisco, is seeking an AI Evaluation Engineer to join their team. This role involves analyzing datasets, designing feedback loops, and partnering closely with AI Engineers to improve model quality. Applicants should have 8+ years of experience... 
    Contract work

    Ironclad

    San Francisco, CA
    12 hours ago
  • A pioneering AI technology firm based in San Francisco is seeking an AI Engineer to own the evaluation infrastructure for AI agents. This role requires designing automated pipelines...  ...candidates have experience in production software and familiarity with TypeScript, React,... 
    Remote job
    Flexible hours

    Fieldguide

    San Francisco, CA
    12 hours ago
  • A cutting-edge AI firm in San Francisco is seeking a Research Engineer to develop evaluation systems and benchmarking pipelines for language models. Candidates should have a strong background in applied research, coding skills, and familiarity with ML models. You will work... 

    Mercor

    San Francisco, CA
    2 days ago
  • $172.5k - $260.1k

     ...not duplicating efforts. Job Category Software Engineering Job Details About Salesforce Salesforce is the #1 AI CRM, where humans with agents drive customer...  ...AI) tools to help our recruiters assess and evaluate candidates' resumes and qualifications... 
    Permanent employment

    Salesforce.Com Inc

    San Francisco, CA
    3 days ago
  • $120k - $170k

     ...adventure? Loft Orbital is looking for a Software Engineer to join our Ground Software Solutions...  ...this role is intentionally wide as we evaluate individuals based on their unique experience...  ...observation, IoT connectivity, on-orbit AI, national security missions, and more.... 
    Temporary work
    Work at office
    Relocation package
    Flexible hours

    Loft Orbital

    San Francisco, CA
    1 day ago
  •  ...skilled professional in San Francisco for a role focused on ensuring the accuracy and reliability of Veeva AI Agents. The position involves defining evaluation strategies, assessing LLM outputs, and creating high-quality datasets through rigorous validation methodologies... 
    Flexible hours

    Veeva Systems, Inc.

    San Francisco, CA
    4 days ago
  • $105k - $125k

     ...and threat-intelligence layer trusted by frontier AI labs, AI unicorns, Fortune 10 companies, and leading...  ...technology platforms. Our adversarial red teaming, model evaluations, and intelligence collection enable engineering, safety, and security teams to stay ahead of... 
    Remote work

    10a Labs

    San Francisco, CA
    5 days ago
  • B Capital seeks a talented individual for an AI Evaluation role in San Francisco. This position involves conducting critical comparative analysis, refining evaluation systems, and collaborating with various teams to enhance model capabilities. The ideal candidate will have... 

    B Capital

    San Francisco, CA
    4 days ago
  •  ...located in San Francisco is seeking an innovative Quality Engineer for their AI products. This role blends ops, strategy, and analytics to...  ...leading labs, and ensure user satisfaction through effective evaluation baselines. Competitive salary and benefits offered, with a... 

    Notion

    San Francisco, CA
    4 days ago

Do you want to receive more vacancies?

Subscribe and receive similar vacancies to Software Engineer - AI Evaluation. Be the first to apply!