Sign up to access all features of our service.
  • Job search
  • Favorites
  • Create a CV
    New
  • Salaries
  • Subscriptions

Software Engineer for AI Model Evaluation [Remote]

$220k

SaidGig

United States
  • Remote job

This role focuses on advancing the evaluation and development of cutting-edge coding agents. You will operate at the intersection of AI research, software engineering, and model evaluation, designing the benchmarks, methodologies, and data systems that shape how next-generation coding models are measured and improved. Key Responsibilities

  • Design and own evaluation frameworks for coding agents, including benchmark specifications, scoring methodologies, rubrics, and quality standards.
  • Lead end-to-end research initiatives aimed at measuring and enhancing coding model performance across various software engineering tasks.
  • Develop high-quality datasets, golden examples, and evaluation protocols that facilitate reliable assessment of frontier coding systems.
  • Analyze model behavior and failure modes, identifying systematic weaknesses and translating findings into actionable improvements for training and evaluation.
  • Build tooling and infrastructure that support large-scale experimentation, data generation, review workflows, and evaluation pipelines.
  • Establish best practices for coding-agent assessment, ensuring methodological rigor, reproducibility, and measurement quality.
  • Collaborate closely with researchers, engineers, and applied AI teams to design experiments and evaluate emerging model capabilities.
  • Contribute to technical reports, benchmark studies, and client-facing research initiatives that communicate model performance and insights.
Qualifications
  • Strong software engineering background with expertise in Python, C++, or comparable programming languages.
  • 3+ years of experience in software engineering, machine learning, AI research, evaluation, or related technical disciplines.
  • Experience designing, reviewing, or validating technical assessments, benchmarks, coding tasks, or evaluation methodologies.
  • Familiarity with large language models, coding agents, reinforcement learning, model evaluation, or related AI systems.
  • Proven ability to build tooling, automate workflows, and enhance technical processes through systematic experimentation.
  • Strong analytical skills with the capacity to investigate model behavior and derive insights from complex technical systems.
  • Excellent written and verbal communication skills, with the ability to clearly articulate technical findings to diverse audiences.
  • Comfortable operating in fast-paced research environments with significant ambiguity and evolving priorities.
Work Terms

Full-time, remote position.

Compensation

Annual salary ranges from $220, 000 to $500, 000.

Eligibility

Open to candidates with the required skills and experience, regardless of location.

Vacancy posted 23 days ago
Similar jobs that could be interesting for youBased on the Software Engineer for AI Model Evaluation [Remote] in United States vacancy
  • $40 - $75 per hour

     ...Join a dynamic team as a Software Engineer, contributing to innovative backend software development and AI model evaluation. This role offers the opportunity to leverage your engineering skills to enhance next-generation AI systems by tackling real-world programming challenges... 
    Suggested
    Remote job
    Hourly pay
    For contractors

    SaidGig

    Remote
    7 days ago
  • Software Engineer (Model Evaluation & Benchmarking) About the Role We are hiring Engineers focused on AI Model Evaluation to build the systems that ensure multimodal AI behaves reliably, consistently, and predictably as it moves from research into production. This position... 
    Suggested

    SpreeAI

    San Francisco, CA
    5 days ago
  •  ...Alignerr is seeking a Senior Python Infrastructure Engineer to work remotely on critical AI model development tasks. You will design, build, and optimize data pipelines, annotation tools, and evaluation systems essential for next-generation AI models. This contract role... 
    Suggested
    Contract work
    Remote work
    Flexible hours

    Alignerr

    Seattle, WA
    1 day ago
  • $40 per hour

    We are looking for a Software Developer to join our team to train AI models. You will measure the progress of these AI chatbots, evaluate their logic, and solve problems to improve the quality of each model. To apply to this role, you will need to be proficient in either... 
    Suggested
    Remote job
    Hourly pay
    Full time
    Contract work
    Part time

    DataAnnotation

    New York, NY
    3 days ago
  • $40 per hour

    A leading AI security solutions provider is seeking experienced cybersecurity professionals to evaluate AI-generated security content and solve real-world technical problems. In this remote role, candidates will require over 2 years of cybersecurity experience, fluency... 
    Suggested
    Hourly pay
    Remote work

    DataAnnotation

    Brooklyn, NY
    1 day ago
  • $40 per hour

    A cybersecurity company is seeking experienced professionals to evaluate AI-generated security content and solve technical problems. This remote position offers the flexibility to choose projects and work on your own schedule, with projects starting at $40 per hour. Candidates... 
    Hourly pay
    Remote work

    DataAnnotation

    Columbia, SC
    5 days ago
  • $40 per hour

    A leading cybersecurity solutions provider is seeking experienced cybersecurity professionals for a remote position. You will evaluate AI-generated security content, solve technical problems, and provide essential feedback to improve AI systems. The ideal candidate will... 
    Hourly pay
    Remote work

    DataAnnotation

    Helena, MT
    1 day ago
  • $40 per hour

     ...cybersecurity professionals to join our team to help train AI models. In this role, you will evaluate AI-generated security content, solve technical...  ...penetration testing, red teaming, incident response, detection engineering, DFIR, malware analysis, threat intelligence, or... 
    Hourly pay
    Full time
    Part time
    Remote work

    DataAnnotation

    Virginia, MN
    5 days ago
  •  ...A leading cybersecurity firm is seeking experienced cybersecurity professionals for a remote role to help train AI models. Candidates will evaluate AI-generated security content, solve technical cybersecurity problems, and provide valuable feedback for the improvement... 
    Remote work
    Flexible hours

    DataAnnotation

    Santa Fe, NM
    5 days ago
  • $40 per hour

    A leading cybersecurity firm is seeking experienced professionals to join their team in evaluating AI-generated security content. You will solve technical problems and provide feedback to enhance AI capabilities related to real-world threats. The ideal candidate has over... 
    Hourly pay
    Remote work

    DataAnnotation

    Providence, RI
    5 days ago
  • A leading cybersecurity platform is seeking experienced professionals to evaluate AI-generated security content and solve technical cybersecurity issues. This role offers the flexibility of full-time or part-time remote work, allowing you to choose projects and set your... 
    Full time
    Part time
    Remote work

    DataAnnotation

    Topeka, KS
    5 days ago
  • $40 per hour

    A cybersecurity solutions provider is seeking experienced cybersecurity professionals for a REMOTE position. You will evaluate AI-generated security content, solve technical problems, and contribute to cybersecurity tools using your expertise. Candidates should have 2+... 
    Hourly pay
    Remote work

    DataAnnotation

    Oregon, WI
    5 days ago
  • $40 per hour

     ...cybersecurity innovation company is seeking experienced professionals to evaluate AI-generated security content and solve technical cybersecurity...  .... In this remote position, you will work with advanced AI models and contribute to improving cybersecurity tools. The ideal... 
    Hourly pay
    Remote work
    Flexible hours

    DataAnnotation

    Lansing, MI
    3 days ago
  • $40 per hour

     ...professionals to join their remote team. In this role, you will evaluate AI-generated security content, design solutions to cybersecurity problems, and provide essential feedback for improving AI models. Candidates should have over 2 years of hands-on experience in cybersecurity... 
    Hourly pay
    Remote work
    Flexible hours

    DataAnnotation

    Kansas City, MO
    1 day ago
  • $40 per hour

    A cybersecurity-focused company is looking for experienced professionals to evaluate AI-generated security content and provide feedback to improve AI systems' understanding of threats. This role, which can be full-time or part-time, allows for flexible project selection... 
    Hourly pay
    Full time
    Part time
    Remote work
    Flexible hours

    DataAnnotation

    Boston, MA
    1 day ago
  • $40 per hour

    A technology company specializing in cybersecurity is seeking experienced professionals to evaluate AI-generated security content and solve technical problems. This remote position is suitable for candidates with 2+ years in cybersecurity and a background in penetration... 
    Hourly pay
    Remote work
    Flexible hours

    DataAnnotation

    Springfield, IL
    1 day ago
  • $40 per hour

    A leading AI training firm is seeking experienced cybersecurity professionals to evaluate AI-generated security content and solve technical problems. This role is remote, allowing you to choose your projects and work schedule. Candidates should have over 2 years of hands... 
    Hourly pay
    Remote work

    DataAnnotation

    Charleston, WV
    18 hours ago
  • $40 per hour

    A leading cybersecurity firm is seeking experienced professionals to evaluate AI-generated security content and solve technical problems in a flexible remote role. Ideal candidates should have over 2 years in cybersecurity, coding experience, strong analytical and writing... 
    Hourly pay
    Remote work
    Flexible hours

    DataAnnotation

    Sioux Falls, SD
    3 days ago
  • SME Careers is seeking a remote Kotlin Engineer to review AI-generated responses and create high-...  ...optimizing AI performance, and ensuring model accuracy. The ideal candidate has a...  ...expert network and requires critical evaluation of technical concepts. #J-18808-Ljbffr... 
    Remote job

    SME Careers

    New York, NY
    3 days ago
  • $60 per hour

    A tech solutions company is looking for a Software Engineer to train AI models and improve their quality. The position is remote, offering flexibility...  ...in languages such as Python or JavaScript and will evaluate code solutions. The role is contract-based with pay up to... 
    Remote job
    Contract work

    DataAnnotation

    Louisiana, MO
    1 day ago
  • A fast-growing AI company seeks a Software Engineer to focus on Model Evaluation & Benchmarking. This role involves building evaluation systems for multimodal AI, ensuring reliable performance. The ideal candidate will possess strong Python programming skills, familiarity... 

    SpreeAI

    San Francisco, CA
    5 days ago
  •  ...located in San Francisco is seeking an innovative Quality Engineer for their AI products. This role blends ops, strategy, and analytics to...  ...leading labs, and ensure user satisfaction through effective evaluation baselines. Competitive salary and benefits offered, with a... 

    Notion

    San Francisco, CA
    2 days ago
  • $30 - $90 per hour

     ...collaborating with cutting-edge AI research. As an...  ...a high-caliber engineering team. Key Responsibilities...  ...test new AI-powered models in Cursor, providing actionable...  ...designing or evaluating experimental tooling and...  ...AI advancements in software development. Work Terms... 
    Hourly pay
    Contract work
    Remote work

    SaidGig

    United States
    16 days ago
  • Medical Professionals can apply their expertise to evaluate AI models and enhance their understanding of healthcare tasks and terminology. This role involves assessing content relevant to your field and providing clear, structured feedback to improve AI performance. No... 
    Hourly pay
    Temporary work
    Part time
    Remote work
    Flexible hours

    SaidGig

    United States
    18 hours ago
  • $40 per hour

    A cybersecurity AI company is looking for experienced professionals to evaluate AI-generated security content and solve technical problems to improve AI systems. This role requires a minimum of 2 years in cybersecurity and allows for remote work with flexible scheduling... 
    Hourly pay
    Remote work
    Flexible hours

    DataAnnotation

    New York, NY
    3 days ago
  • $40 per hour

    A cybersecurity firm is seeking experienced professionals to join their team to evaluate AI-generated security content and solve technical problems. The role is flexible, allowing you to work remotely and choose projects that match your interests. Candidates should have... 
    Hourly pay
    Remote work
    Flexible hours

    DataAnnotation

    Annapolis, MD
    3 days ago
  • $40 per hour

    A technology company is seeking a DevOps Engineer to evaluate AI models and improve their performance. The position is remote, offering flexibility in scheduling and project selection. Ideal candidates will have proficiency in programming languages such as Python or JavaScript... 
    Remote job
    Hourly pay

    DataAnnotation

    Raleigh, NC
    3 days ago
  • $40 per hour

    A technology company specializing in AI is seeking experienced cybersecurity professionals to join a remote team. Candidates will evaluate AI-generated security content, solve technical cybersecurity problems, and provide valuable feedback for improving AI systems. The... 
    Hourly pay
    Remote work
    Flexible hours

    DataAnnotation

    Hartford, CT
    3 days ago
  • $40 per hour

    A technology company is seeking a Software Developer to evaluate AI models and provide coding challenges to AI chatbots. This remote position offers flexibility in project selection and scheduling. Ideal candidates will have proficiency in programming languages such as... 
    Remote job
    Hourly pay

    DataAnnotation

    Oklahoma City, OK
    3 days ago
  • $40 per hour

    A leading AI training company is seeking a DevOps Engineer to join their remote team. In this role, you will provide coding challenges to AI chatbots and evaluate their outputs for correctness and performance. Candidates should be proficient in Python or JavaScript and... 
    Remote job
    Hourly pay

    DataAnnotation

    New York, NY
    5 days ago

Do you want to receive more vacancies?

Subscribe and receive similar vacancies to Software Engineer for AI Model Evaluation [Remote]. Be the first to apply!