Software Engineer for AI Model Evaluation

$220k

SaidGig

This role focuses on advancing the evaluation and development of frontier coding agents, blending AI research, software engineering, and model evaluation. You will design benchmarks, methodologies, and data systems that define how next-generation coding models are assessed and enhanced. Key Responsibilities

Design and own evaluation frameworks for coding agents, including benchmark specifications, scoring methodologies, rubrics, and quality standards.
Lead end-to-end research initiatives aimed at measuring and improving coding model performance across various software engineering tasks.
Develop high-quality datasets, golden examples, and evaluation protocols for reliable assessment of frontier coding systems.
Analyze model behavior and failure modes, identifying systematic weaknesses and translating findings into actionable improvements for training and evaluation.
Build tooling and infrastructure to support large-scale experimentation, data generation, review workflows, and evaluation pipelines.
Establish best practices for coding-agent assessment, ensuring methodological rigor, reproducibility, and measurement quality.
Collaborate closely with researchers, engineers, and applied AI teams to design experiments and evaluate emerging model capabilities.
Contribute to technical reports, benchmark studies, and client-facing research initiatives that communicate model performance and insights.

Qualifications

Strong software engineering background with expertise in Python, C++, or comparable programming languages.
3+ years of experience in software engineering, machine learning, AI research, evaluation, or related technical disciplines.
Experience designing, reviewing, or validating technical assessments, benchmarks, coding tasks, or evaluation methodologies.
Familiarity with large language models, coding agents, reinforcement learning, model evaluation, or related AI systems.
Proven ability to build tooling, automate workflows, and improve technical processes through systematic experimentation.
Strong analytical skills with the ability to investigate model behavior and derive insights from complex technical systems.
Excellent written and verbal communication skills, with the ability to clearly articulate technical findings to diverse audiences.
Comfortable operating in fast-moving research environments with significant ambiguity and evolving priorities.

Preferred

Experience working on frontier AI systems, coding agents, or model evaluation research.
Deep interest in understanding how data, evaluations, and feedback mechanisms influence model capabilities.
Track record of independently driving ambiguous technical or research projects from conception to execution.
Experience designing benchmarks or datasets for machine learning systems at scale.
Familiarity with agentic workflows, tool use, reinforcement learning, or post-training methodologies.
Publications, open-source contributions, or demonstrated technical leadership in AI research.

Work Terms

Full-time position with remote work flexibility.

Compensation

Annual salary range of $220, 000 - $500, 000.

Eligibility

Open to candidates with the required skills and experience, regardless of location.

Apply

Vacancy posted 2 days ago

Similar jobs that could be interesting for youBased on the Software Engineer for AI Model Evaluation in United States vacancy

Senior Software Engineer - AI Model Evaluation
$40 per hour
...specialists with project-based AI opportunities for... ..., focused on testing, evaluating, and improving AI... ...coding agents - how well a model handles real-world... ...labeling. Not prompt engineering. Not writing code... ...Qualifications ~5+ years in software development. ~Core...
Suggested
Permanent employment
Temporary work
Part time
Mindrift
Remote
7 days ago
Remote Web Platform Engineer - AI Model Evaluator
$30 per hour
A technology company is seeking a Web Platform Engineer to evaluate AI chatbots and enhance model performance. This role requires proficiency in programming languages like Python and JavaScript. You will assess AI outputs from coding challenges and writing tasks, ensuring...
Suggested
Hourly pay
Remote work
Flexible hours
DataAnnotation
Jackson, MS
1 day ago
Web Developer for AI Model Evaluation
$105 per hour
...leverage their technical skills to contribute to AI research projects, focusing on enhancing the capabilities of Large Language Models (LLMs) in business communication and... ...Responsibilities Develop domain-specific prompts and evaluate LLM responses. Conduct independent...
Suggested
Remote work
Flexible hours
SaidGig
United States
4 days ago
Remote AI Security Engineer - SOC & Model Evaluator
$40 per hour
A cybersecurity company is seeking experienced professionals to evaluate AI-generated security content and solve technical problems. This remote position offers the flexibility to choose projects and work on your own schedule, with projects starting at $40 per hour. Candidates...
Suggested
Hourly pay
Remote work
DataAnnotation
Columbia, SC
1 day ago
Staff Software Engineer/Data Scientist, Large Model Evaluation
$238k - $302k
...across 15+ U.S. states. The Large Model Evaluation team is at the nexus of Waymo’s AI ambition . With advancements in... ...for quantitatively-minded engineers to research and propose new ways... ...experience in a heavily quantitative software engineering area ~ Experience navigating...
Suggested
Full time
Remote work
Waymo
San Francisco, CA
4 hours ago
Remote AI Security Engineer - SOC & Model Evaluator
A leading cybersecurity platform is seeking experienced professionals to evaluate AI-generated security content and solve technical cybersecurity issues. This role offers the flexibility of full-time or part-time remote work, allowing you to choose projects and set your...
Full time
Part time
Remote work
DataAnnotation
Topeka, KS
1 day ago
Remote AI Security Engineer - SOC & Model Evaluator
$40 per hour
A cybersecurity solutions provider is seeking experienced cybersecurity professionals for a REMOTE position. You will evaluate AI-generated security content, solve technical problems, and contribute to cybersecurity tools using your expertise. Candidates should have 2+...
Hourly pay
Remote work
DataAnnotation
Oregon, WI
1 day ago
Remote AI Security Engineer - SOC & Model Evaluator
$40 per hour
A technology company specializing in cybersecurity is seeking experienced professionals to evaluate AI-generated security content and solve technical problems. This remote position is suitable for candidates with 2+ years in cybersecurity and a background in penetration...
Hourly pay
Remote work
Flexible hours
DataAnnotation
Springfield, IL
2 days ago
Remote AI Security Engineer - SOC & Model Evaluator
$40 per hour
A cybersecurity-focused company is looking for experienced professionals to evaluate AI-generated security content and provide feedback to improve AI systems' understanding of threats. This role, which can be full-time or part-time, allows for flexible project selection...
Hourly pay
Full time
Part time
Remote work
Flexible hours
DataAnnotation
Boston, MA
2 days ago
Remote AI Security Engineer - SOC & Model Evaluator
$40 per hour
...professionals to join their remote team. In this role, you will evaluate AI-generated security content, design solutions to cybersecurity problems, and provide essential feedback for improving AI models. Candidates should have over 2 years of hands-on experience in cybersecurity...
Hourly pay
Remote work
Flexible hours
DataAnnotation
Kansas City, MO
2 days ago
Remote AI Security Engineer - SOC & Model Evaluator
$40 per hour
A leading cybersecurity solutions provider is seeking experienced cybersecurity professionals for a remote position. You will evaluate AI-generated security content, solve technical problems, and provide essential feedback to improve AI systems. The ideal candidate will...
Hourly pay
Remote work
DataAnnotation
Helena, MT
2 days ago
Remote AI Security Engineer - SOC & Model Evaluator
$40 per hour
...cybersecurity professionals to join our team to help train AI models. In this role, you will evaluate AI-generated security content, solve technical... ...penetration testing, red teaming, incident response, detection engineering, DFIR, malware analysis, threat intelligence, or...
Hourly pay
Full time
Part time
Remote work
DataAnnotation
Virginia, MN
1 day ago
Computational Engineer for AI Model Evaluation
$20 - $60 per hour
...expertise to help train next-generation AI systems. Your work will shape how models learn, reason, and perform through high-quality... ...in computational, simulation, or systems engineering to inform advanced AI benchmarking and evaluation processes. Analyze and provide...
Hourly pay
Contract work
Remote work
SaidGig
United States
a month ago
Software Engineer, Model Lifecycle
$204k - $259k
...The core challenge within Model Lifecycle is accelerating Waymo... ...role, you will report to an engineering manager. You will:... ...efficient model training and evaluation. Develop infrastructure to... ...Passionate about data-centric AI and autonomous driving applications...
Full time
Temporary work
Remote work
Waymo
Remote
4 hours ago
Model Performance Software Engineer, Claude Code
$405k
...interpretable, and steerable AI systems. We want AI to... ...committed researchers, engineers, policy experts, and... ...re looking for a Staff Software Engineer to set... ...systems, tooling, and evaluation infrastructure that determine... ...frameworks that measure model capabilities across diverse...
Full time
Work at office
Visa sponsorship
Flexible hours
Anthropic
New York, NY
4 hours ago
Remote AI Security Engineer - SOC & Model Evaluator
$40 per hour
A cybersecurity firm is seeking experienced cybersecurity professionals to join their team in a remote capacity. You will evaluate AI-generated security content and solve technical cybersecurity problems. The ideal candidate will have a minimum of 2 years hands-on experience...
Remote job
Hourly pay
Flexible hours
DataAnnotation
Madison, WI
2 days ago
Remote AI Security Engineer - SOC & Model Evaluator
$40 per hour
A leading AI-focused cybersecurity firm is looking for experienced cybersecurity professionals to evaluate AI-generated content and solve technical security problems. In this flexible role, you can work remotely and choose your projects. Ideal candidates will have 2+ years...
Remote job
Hourly pay
Flexible hours
DataAnnotation
New York, NY
4 days ago
Rust Backend Developer for AI Model Evaluation
$30 - $90 per hour
...collaborating with cutting-edge AI research. As an... ...a high-caliber engineering team. Key Responsibilities... ...test new AI-powered models in Cursor, providing actionable... ...designing or evaluating experimental tooling and... ...AI advancements in software development. Work Terms...
Hourly pay
Contract work
Remote work
SaidGig
United States
a month ago
Remote Web Platform Engineer - AI Model Evaluator
$30 - $40 per hour
An AI training company is seeking a Web Platform Engineer to evaluate AI chatbots' outputs and improve their logic. The role allows for remote work and on-demand project selection, paying $30-$40+ per hour. Candidates should be fluent in English and have experience with...
Remote job
Hourly pay
For contractors
DataAnnotation
Little Rock, AR
2 days ago
Remote AI Security Engineer - SOC & Model Evaluator
$40 per hour
A leading AI security solutions provider is seeking experienced cybersecurity professionals to evaluate AI-generated security content and solve real-world technical problems. In this remote role, candidates will require over 2 years of cybersecurity experience, fluency...
Remote job
Hourly pay
DataAnnotation
Brooklyn, NY
2 days ago
Remote AI Security Engineer - SOC & Model Evaluator
$40 per hour
A leading cybersecurity firm is seeking experienced professionals to evaluate AI-generated security content and solve technical cybersecurity problems. You will enhance how AI systems handle real-world threats while working remotely on an hourly project basis starting at...
Remote job
Hourly pay
DataAnnotation
California, MO
2 days ago
Remote AI Security Engineer - SOC & Model Evaluator
$40 per hour
A leading AI training firm is seeking experienced cybersecurity professionals to evaluate AI-generated security content and solve technical problems. This role is remote, allowing you to choose your projects and work schedule. Candidates should have over 2 years of hands...
Remote job
Hourly pay
DataAnnotation
Charleston, WV
4 days ago
Remote AI Security Engineer - SOC & Model Evaluator
A cybersecurity solutions company is looking for experienced cybersecurity professionals to help train AI models. You will work remotely to evaluate AI-generated security content, solve technical problems, and provide feedback to improve AI systems. Ideal candidates have...
Remote job
Flexible hours
DataAnnotation
New York, NY
4 days ago
Remote AI Security Engineer - SOC & Model Evaluator
$40 per hour
...cybersecurity innovation company is seeking experienced professionals to evaluate AI-generated security content and solve technical cybersecurity... .... In this remote position, you will work with advanced AI models and contribute to improving cybersecurity tools. The ideal...
Remote job
Hourly pay
Flexible hours
DataAnnotation
Lansing, MI
4 days ago
Remote AI Security Engineer - SOC & Model Evaluator
$40 per hour
...leading cybersecurity firm is looking for experienced cybersecurity professionals to evaluate AI-generated content and solve technical problems. The role involves working with advanced AI models, providing feedback, and contributing to the cybersecurity industry's future....
Remote job
Hourly pay
Flexible hours
DataAnnotation
Washington DC
2 days ago
Remote AI Security Engineer - SOC & Model Evaluator
$40 per hour
A cybersecurity firm is looking for experienced professionals to join its team. This remote role involves evaluating AI-generated security content and solving technical cybersecurity problems. Candidates should have over 2 years of hands-on experience in cybersecurity and...
Remote job
Hourly pay
Flexible hours
DataAnnotation
Brooklyn, NY
2 days ago
Remote AI Security Engineer - SOC & Model Evaluator
$40 per hour
A leading cybersecurity firm is seeking qualified professionals to evaluate AI-generated security content and solve technical problems. This remote role requires hands-on experience in cybersecurity, including penetration testing or related areas. Candidates should possess...
Remote job
Hourly pay
Flexible hours
DataAnnotation
Oklahoma City, OK
2 days ago
Remote AI Security Engineer - SOC & Model Evaluator
$40 per hour
A leading cybersecurity firm is seeking experienced professionals to join their team in evaluating AI-generated security content. You will solve technical problems and provide feedback to enhance AI capabilities related to real-world threats. The ideal candidate has over...
Remote job
Hourly pay
DataAnnotation
Providence, RI
3 days ago
Remote AI Security Engineer - SOC & Model Evaluator
$40 per hour
A leading cybersecurity firm is seeking experienced professionals to evaluate AI-generated security content and solve technical problems in a flexible remote role. Ideal candidates should have over 2 years in cybersecurity, coding experience, strong analytical and writing...
Remote job
Hourly pay
Flexible hours
DataAnnotation
Sioux Falls, SD
4 days ago
Remote AI Security Engineer - SOC & Model Evaluator
...leading cybersecurity firm is seeking experienced professionals to evaluate AI-generated cybersecurity content and solve technical security problems. You will play a significant role in training AI models, providing critical feedback, and improving system accuracy. This...
Remote job
Flexible hours
DataAnnotation
New York, NY
2 days ago

Do you want to receive more vacancies?

Subscribe and receive similar vacancies to Software Engineer for AI Model Evaluation. Be the first to apply!