Software Engineer for AI Model Evaluation [Remote]
$220kSaidGig
- Remote job
This role focuses on advancing the evaluation and development of cutting-edge coding agents. You will operate at the intersection of AI research, software engineering, and model evaluation, designing the benchmarks, methodologies, and data systems that shape how next-generation coding models are measured and improved. Key Responsibilities
- Design and own evaluation frameworks for coding agents, including benchmark specifications, scoring methodologies, rubrics, and quality standards.
- Lead end-to-end research initiatives aimed at measuring and enhancing coding model performance across various software engineering tasks.
- Develop high-quality datasets, golden examples, and evaluation protocols that facilitate reliable assessment of frontier coding systems.
- Analyze model behavior and failure modes, identifying systematic weaknesses and translating findings into actionable improvements for training and evaluation.
- Build tooling and infrastructure that support large-scale experimentation, data generation, review workflows, and evaluation pipelines.
- Establish best practices for coding-agent assessment, ensuring methodological rigor, reproducibility, and measurement quality.
- Collaborate closely with researchers, engineers, and applied AI teams to design experiments and evaluate emerging model capabilities.
- Contribute to technical reports, benchmark studies, and client-facing research initiatives that communicate model performance and insights.
- Strong software engineering background with expertise in Python, C++, or comparable programming languages.
- 3+ years of experience in software engineering, machine learning, AI research, evaluation, or related technical disciplines.
- Experience designing, reviewing, or validating technical assessments, benchmarks, coding tasks, or evaluation methodologies.
- Familiarity with large language models, coding agents, reinforcement learning, model evaluation, or related AI systems.
- Proven ability to build tooling, automate workflows, and enhance technical processes through systematic experimentation.
- Strong analytical skills with the capacity to investigate model behavior and derive insights from complex technical systems.
- Excellent written and verbal communication skills, with the ability to clearly articulate technical findings to diverse audiences.
- Comfortable operating in fast-paced research environments with significant ambiguity and evolving priorities.
Full-time, remote position.
CompensationAnnual salary ranges from $220, 000 to $500, 000.
EligibilityOpen to candidates with the required skills and experience, regardless of location.
$40 - $75 per hour
...Join a dynamic team as a Software Engineer, contributing to innovative backend software development and AI model evaluation. This role offers the opportunity to leverage your engineering skills to enhance next-generation AI systems by tackling real-world programming challenges...SuggestedRemote jobHourly payFor contractors- Software Engineer (Model Evaluation & Benchmarking) About the Role We are hiring Engineers focused on AI Model Evaluation to build the systems that ensure multimodal AI behaves reliably, consistently, and predictably as it moves from research into production. This position...Suggested
- ...Alignerr is seeking a Senior Python Infrastructure Engineer to work remotely on critical AI model development tasks. You will design, build, and optimize data pipelines, annotation tools, and evaluation systems essential for next-generation AI models. This contract role...SuggestedContract workRemote workFlexible hours
$40 per hour
We are looking for a Software Developer to join our team to train AI models. You will measure the progress of these AI chatbots, evaluate their logic, and solve problems to improve the quality of each model. To apply to this role, you will need to be proficient in either...SuggestedRemote jobHourly payFull timeContract workPart time$40 per hour
A leading AI security solutions provider is seeking experienced cybersecurity professionals to evaluate AI-generated security content and solve real-world technical problems. In this remote role, candidates will require over 2 years of cybersecurity experience, fluency...SuggestedHourly payRemote work$40 per hour
A cybersecurity company is seeking experienced professionals to evaluate AI-generated security content and solve technical problems. This remote position offers the flexibility to choose projects and work on your own schedule, with projects starting at $40 per hour. Candidates...Hourly payRemote work$40 per hour
A leading cybersecurity solutions provider is seeking experienced cybersecurity professionals for a remote position. You will evaluate AI-generated security content, solve technical problems, and provide essential feedback to improve AI systems. The ideal candidate will...Hourly payRemote work$40 per hour
...cybersecurity professionals to join our team to help train AI models. In this role, you will evaluate AI-generated security content, solve technical... ...penetration testing, red teaming, incident response, detection engineering, DFIR, malware analysis, threat intelligence, or...Hourly payFull timePart timeRemote work- ...A leading cybersecurity firm is seeking experienced cybersecurity professionals for a remote role to help train AI models. Candidates will evaluate AI-generated security content, solve technical cybersecurity problems, and provide valuable feedback for the improvement...Remote workFlexible hours
$40 per hour
A leading cybersecurity firm is seeking experienced professionals to join their team in evaluating AI-generated security content. You will solve technical problems and provide feedback to enhance AI capabilities related to real-world threats. The ideal candidate has over...Hourly payRemote work- A leading cybersecurity platform is seeking experienced professionals to evaluate AI-generated security content and solve technical cybersecurity issues. This role offers the flexibility of full-time or part-time remote work, allowing you to choose projects and set your...Full timePart timeRemote work
$40 per hour
A cybersecurity solutions provider is seeking experienced cybersecurity professionals for a REMOTE position. You will evaluate AI-generated security content, solve technical problems, and contribute to cybersecurity tools using your expertise. Candidates should have 2+...Hourly payRemote work$40 per hour
...cybersecurity innovation company is seeking experienced professionals to evaluate AI-generated security content and solve technical cybersecurity... .... In this remote position, you will work with advanced AI models and contribute to improving cybersecurity tools. The ideal...Hourly payRemote workFlexible hours$40 per hour
...professionals to join their remote team. In this role, you will evaluate AI-generated security content, design solutions to cybersecurity problems, and provide essential feedback for improving AI models. Candidates should have over 2 years of hands-on experience in cybersecurity...Hourly payRemote workFlexible hours$40 per hour
A cybersecurity-focused company is looking for experienced professionals to evaluate AI-generated security content and provide feedback to improve AI systems' understanding of threats. This role, which can be full-time or part-time, allows for flexible project selection...Hourly payFull timePart timeRemote workFlexible hours$40 per hour
A technology company specializing in cybersecurity is seeking experienced professionals to evaluate AI-generated security content and solve technical problems. This remote position is suitable for candidates with 2+ years in cybersecurity and a background in penetration...Hourly payRemote workFlexible hours$40 per hour
A leading AI training firm is seeking experienced cybersecurity professionals to evaluate AI-generated security content and solve technical problems. This role is remote, allowing you to choose your projects and work schedule. Candidates should have over 2 years of hands...Hourly payRemote work$40 per hour
A leading cybersecurity firm is seeking experienced professionals to evaluate AI-generated security content and solve technical problems in a flexible remote role. Ideal candidates should have over 2 years in cybersecurity, coding experience, strong analytical and writing...Hourly payRemote workFlexible hours- SME Careers is seeking a remote Kotlin Engineer to review AI-generated responses and create high-... ...optimizing AI performance, and ensuring model accuracy. The ideal candidate has a... ...expert network and requires critical evaluation of technical concepts. #J-18808-Ljbffr...Remote job
$60 per hour
A tech solutions company is looking for a Software Engineer to train AI models and improve their quality. The position is remote, offering flexibility... ...in languages such as Python or JavaScript and will evaluate code solutions. The role is contract-based with pay up to...Remote jobContract work- A fast-growing AI company seeks a Software Engineer to focus on Model Evaluation & Benchmarking. This role involves building evaluation systems for multimodal AI, ensuring reliable performance. The ideal candidate will possess strong Python programming skills, familiarity...
- ...located in San Francisco is seeking an innovative Quality Engineer for their AI products. This role blends ops, strategy, and analytics to... ...leading labs, and ensure user satisfaction through effective evaluation baselines. Competitive salary and benefits offered, with a...
$30 - $90 per hour
...collaborating with cutting-edge AI research. As an... ...a high-caliber engineering team. Key Responsibilities... ...test new AI-powered models in Cursor, providing actionable... ...designing or evaluating experimental tooling and... ...AI advancements in software development. Work Terms...Hourly payContract workRemote work- Medical Professionals can apply their expertise to evaluate AI models and enhance their understanding of healthcare tasks and terminology. This role involves assessing content relevant to your field and providing clear, structured feedback to improve AI performance. No...Hourly payTemporary workPart timeRemote workFlexible hours
$40 per hour
A cybersecurity AI company is looking for experienced professionals to evaluate AI-generated security content and solve technical problems to improve AI systems. This role requires a minimum of 2 years in cybersecurity and allows for remote work with flexible scheduling...Hourly payRemote workFlexible hours$40 per hour
A cybersecurity firm is seeking experienced professionals to join their team to evaluate AI-generated security content and solve technical problems. The role is flexible, allowing you to work remotely and choose projects that match your interests. Candidates should have...Hourly payRemote workFlexible hours$40 per hour
A technology company is seeking a DevOps Engineer to evaluate AI models and improve their performance. The position is remote, offering flexibility in scheduling and project selection. Ideal candidates will have proficiency in programming languages such as Python or JavaScript...Remote jobHourly pay$40 per hour
A technology company specializing in AI is seeking experienced cybersecurity professionals to join a remote team. Candidates will evaluate AI-generated security content, solve technical cybersecurity problems, and provide valuable feedback for improving AI systems. The...Hourly payRemote workFlexible hours$40 per hour
A technology company is seeking a Software Developer to evaluate AI models and provide coding challenges to AI chatbots. This remote position offers flexibility in project selection and scheduling. Ideal candidates will have proficiency in programming languages such as...Remote jobHourly pay$40 per hour
A leading AI training company is seeking a DevOps Engineer to join their remote team. In this role, you will provide coding challenges to AI chatbots and evaluate their outputs for correctness and performance. Candidates should be proficient in Python or JavaScript and...Remote jobHourly pay
Do you want to receive more vacancies?
Subscribe and receive similar vacancies to Software Engineer for AI Model Evaluation [Remote]. Be the first to apply!
- software engineer amazon United States
- experienced software developer United States
- federal - software developer United States
- software developer internship United States
- senior software engineer United States
- software developer fintech United States
- part time software developer remote United States
- software developer intern United States
- software data engineer United States
- software engineer matlab simulink United States


