Data Engineer/Data Scientist for AI Benchmark Evaluation [Remote]

$50 per hour

SaidGig

Remote job

Role Overview

This position offers an exciting opportunity for experienced Software Engineers specializing in Data Engineering and Data Science to engage in benchmark-driven evaluation projects. You will work with production-like datasets, data pipelines, and data science tasks aimed at evaluating and enhancing the performance of advanced AI systems. The ideal candidate will possess a solid foundation in both data engineering and data science, with the capability to navigate data preparation, analysis, and model-related workflows in real-world codebases.

Key Responsibilities

Work with structured and unstructured datasets to support SWE Bench-style evaluation tasks.
Design, build, and validate data pipelines used in benchmarking and evaluation workflows.
Perform data processing, analysis, feature preparation, and validation for data science use cases.
Write, run, and modify Python code to process data and support experiments locally.
Evaluate data quality, transformations, and outputs for correctness and reproducibility.
Create clean, well-documented, and reusable data workflows suitable for benchmarking.
Participate in code reviews to ensure high standards of code quality and maintainability.
Collaborate with researchers and engineers to design challenging, real-world data engineering and data science tasks for AI systems.

Qualifications

Minimum 3+ years of overall experience as a Data Engineer, Data Scientist, or Software Engineer (data-focused).
Strong proficiency in Python for data engineering and data science workflows.
Demonstrable experience with data processing, analysis, and model-related workflows.
Solid understanding of machine learning and data science fundamentals.
Experience working with structured and unstructured data.
Ability to understand, navigate, and modify complex, real-world codebases.
Experience writing readable, reusable, maintainable, and well-documented code.
Strong problem-solving skills, including experience with algorithmic or data-intensive problems.
Excellent spoken and written English communication skills.

Work Terms

Commitments Required: At least 4 hours per day and a minimum of 20 hours per week with 4 hours of overlap with PST.
Engagement Type: Contractor assignment (no medical/paid leave).
Duration of Contract: 3 months (adjustable based on engagement).

Compensation

Compensation details will be discussed during the interview process.

Eligibility

This is a fully remote position.
Opportunity to work on cutting-edge AI projects with leading LLM companies.

Apply

Vacancy posted 7 days ago

Similar jobs that could be interesting for youBased on the Data Engineer/Data Scientist for AI Benchmark Evaluation [Remote] in Remote vacancy

Research Engineer, Evaluations
$210k - $260k
...the best-in-class Voice AI models powering the... ...for a Senior Research Engineer to join our streaming... ...measuring the right things, benchmarking against the right... ...building and extending evaluation tooling and translating... ...scripts, work with data pipelines, and are comfortable...
Suggested
Remote work
Remote Jobs
New York, NY
2 days ago
Remote AI Benchmark Engineer & Researcher
An AI technology startup is seeking a Benchmarking Specialist in Palo Alto to design and execute ML evaluation benchmarks. You'll work closely with the R&D team to define data standards and maintain documentation. The ideal candidate has experience in ML/LLM evaluation...
Suggested
Remote job
Full time
Immediate start
Pathway
Palo Alto, CA
3 days ago
AI Benchmark & Datasets Engineer / Researcher
...frontier model that solves AI's fundamental memory... ...with the fastest data processing engine on the market,... ...Stamirowska, a complexity scientist who created a team consisting... ...and execute rigorous benchmarks and define dataset... ..., you will build the evaluation infrastructure that...
Suggested
Permanent employment
Full time
Contract work
Immediate start
Remote work
Pathway
Palo Alto, CA
2 days ago
Applied Data Scientist, LLM Evaluation
...Applied Data Scientist, LLM Evaluation Introduction At Driver, we're building... ...a core compiler-like engine, a heavily asynchronous/distributed... ...layer for employees and AI agents alike to use in... ...and readability. Build benchmarking and experimentation infrastructure...
Suggested
Remote work
Flexible hours
Driver AI Inc.
United States
2 days ago
Remote Machine Learning Engineer / Data Scientist
...global provider of enterprise AI products and services, on a... ...proprietary AI Studio and AI Engines, the company helps drive the... ...Machine Learning Engineer / Data Scientist to build and deploy machine learning... ...to model development, evaluation, deployment, and monitoring—often...
Suggested
Full time
H1b
Local area
Remote work
GrabJobs
United States
16 hours ago
Sr Data Engineer/ Scientist
$48 per hour
...Description Job Description At Kelly® Engineering, we’re passionate about helping you... ...about this one? We’re seeking a Sr Data Engineer/ Scientist to work at a premier biotechnology... ...advanced analytics, machine learning, and AI initiatives across manufacturing and...
Bi-weekly pay
Hourly pay
Full time
Temporary work
Local area
Shift work
Kelly Services
Puerto Rico
3 days ago
Senior Machine Learning Engineer (Research Scientist) - DFAI
...Senior Research Scientist We believe that the... ...and Amsterdam. The Data Foundation and AI team within Plaid's... ...production serving, evaluation, and monitoring, enabling... ..., feature engineering workflows, and monitoring... ...optimizing for a single benchmark metric. In close...
Work experience placement
Local area
Remote work
Plaid
United States
4 days ago
Sr. Data Engineer
$150k - $200k
...Senior Data Engineer We are seeking a seasoned Senior Data Engineer... ...autonomy to define engineering benchmarks, mentor fellow engineers,... ...Lead data platform and vendor evaluations, guiding build vs. buy... ...support analytics, reporting, AI/ML, and operational decision...
Remote work
Flexible hours
Night shift
Ursa Space Systems Inc
United States
5 days ago
India - Senior Data Engineer
...Senior Data Engineer At Inchcape, our vision is to have a connected... ...compliance. Research and evaluate new features and patterns in... ...recommendations for adoption, enabling an AI-driven data strategy.... ...self-service. Performance benchmarks and tuning reports...
For contractors
Local area
Remote work
Worldwide
ISS Group
United States
2 days ago
Senior Data Engineer
$1,000 per month
...Senior Data Engineer Spellbook is seeking a Senior Data Engineer to... ...both internal analytics and AI-driven product capabilities,... ...scheduling workflows. All candidate evaluations, interviews, and hiring... ...Spellbook uses industry benchmark data to establish compensation...
Contract work
Remote work
Flexible hours
Spellbook
United States
2 days ago
Data Engineer
$160k - $174k
...growing team of world-class engineering, operations, medical... ...through value-based, AI-driven precision diagnostic... ...the Team The BI & Data team at Cleerly provides... ...architecture and help evaluate trade-offs across build... ...and is aligned to market benchmarks. Candidates located in...
Remote work
Cleerly, LLC
New York, NY
3 days ago
Research Engineer, RSP Evaluations (Autonomy)
$315k
We are looking for Research Engineers to build “gold standard” evaluations for catastrophic risks, in order to understand what AI Safety Level (ASL) to assign to models. Research leads on this team collaborate with engineers in one of our focus areas: CBRN, Cyber, Autonomy...
Currently hiring
Work at office
Immediate start
Home office
Visa sponsorship
Relocation package
Anthropic
New York, NY
2 days ago
Health AI Data Scientist — Rare-Disease Diagnostics & Evaluation
$129.99k - $149.48k
...about turning complex data into actionable... ...insights? As a Health Data Scientist focused on AI & Clinical Data... ...This is a Science and Engineering and Technical... ...leadership in data design, evaluation strategy, and... ...ensure that datasets, benchmarks, and evaluation methods...
Full time
Work at office
Remote work
Ripple Effect
Washington DC
2 days ago
Senior Data Engineering Analyst
...Sinch makes it easy. Our AI-infused Super Network,... ...and optimize scalable data pipelines and modern data... ...Collaborate with data scientists, analysts, and product... ...Strong experience as a Data Engineer or in a similar... ...interviews designed to evaluate your skills, experience...
Remote work
Worldwide
Home office
Flexible hours
Sinch
United States
3 days ago
Analytics Engineer
$164.2k - $229.9k
...information, visit Analytics Engineer - Consumer Data Science Check out... ...closely with Data Scientists and members of... ...a big plus. Agentic AI-assisted development... ...and country location, benchmarked against similar stage... ...this information to evaluate your application for...
For contractors
Work experience placement
Work at office
Remote work
Flexible hours
Reddit
United States
1 day ago
Expert Systems Engineer/Data Scientist
$190k - $225k
...Expert Systems Engineer/Data Scientist Location US-VA-Chantilly ID 2026-83... ...readiness and capabilities of this client's AI technologies as a blend of systems... ...requirements into technical solutions, develop evaluation CONOPS, coordinate customer and...
Full time
Remote work
Markon Solutions
Chantilly, Loudoun County, VA
8 days ago
Data Engineer & Data Scientist (Remote)
...EngrewLabs is an AI-native technology company focused on building intelligent automation, data platforms, and next-generation AI solutions... ...models (LLMs), data engineering, and scalable cloud infrastructure... ...solutions. * Research and evaluate emerging technologies in...
Remote work
EngrewLabs
Saint Petersburg, FL
3 days ago
Senior Data Product Engineer
$50k
...They build tailored, data‑driven campaigns across... ...versioning, and cross‑client benchmarking A self‑service... ...Shopify, etc.) without engineering involvement A first‑... ...Agentic automation: AI agent orchestration pipelines... ...automation (we evaluate this directly) Marketing...
Softline Solutions, Inc.
Glendale, CA
1 day ago
Remote R Data Engineer: AI Training Content & Evaluation
SME Careers seeks a remote R Engineer to review AI-generated content and generate high-quality data analysis. The role involves ensuring model integrity, optimizing AI performance, and developing training content. Ideal candidates will have 2+ years of experience in R...
Remote job
Contract work
SME Careers
New Bremen, OH
3 days ago
Data Infrastructure Engineer (Rust) - High Performance Computing
...Senior Rust Full-Stack Engineer - AI Data & Infrastructure About the Role What if... ...data pipelines, annotation tooling, and evaluation systems used by leading AI research... ...AI/ML workflows, model training, or benchmarking pipelines Experience building distributed...
Hourly pay
Ongoing contract
Contract work
Freelance
Remote work
Flexible hours
Alignerr
New York, NY
2 days ago
Data Engineer - Remote
...Description If you're a senior Data Engineer who thrives on precision,... ...how the next generation of AI systems reason about data infrastructure... ...training, annotation, or evaluating AI-generated technical... ...and responsibly. Support benchmarking efforts by evaluating model...
Remote job
For contractors
YO IT Consulting
New York, NY
1 day ago
AI Research Engineer / Data Scientist (LLM)
...AI Research Engineer / Data Scientist (LLM) - Mid-Senior Job location: Morristown NJ ( Tri state candidate ) Role Summary Own... ...and agentic workflows. You'll drive architecture and evaluation strategy, productionize services with reliability and guardrails...
Damco Solutions
Morristown, NJ
2 days ago
Senior Staff ML Engineer, Data & Evaluation (Remote)
Airbnb, Inc. is hiring a Senior Staff Machine Learning Engineer, focusing on driving evaluation strategies and data infrastructure for CSxAI initiatives. This role... ...PhD in a relevant field, extensive experience in ML/AI systems, and strong leadership in technical...
Remote job
airbnb, Inc.
San Francisco, CA
2 days ago
Remote AI Data Quality Annotator & Evaluator
$30 - $60 per hour
Mercor is seeking detail-oriented generalists to support data quality control and annotation projects with leading AI labs. This part-time role involves reviewing, evaluating, and labeling data outputs to benchmark and improve AI models. The ideal candidate will be...
Remote job
Hourly pay
Part time
Immediate start
10 hours per week
Mercor
Carrollton, TX
2 days ago
Senior Staff ML Engineer, Data & Evaluation - Remote
...tech company is looking for a Senior Staff Machine Learning Engineer to drive ML evaluation for customer support initiatives. The ideal candidate will... ...collaborating with cross-functional teams, and enhancing AI systems. This position is remote eligible, requiring...
Remote job
airbnb, Inc.
New York, NY
2 days ago
Research & Development Data Engineer Remote, United States
## Data EngineerApplylocations: Remote, United Statestime... ...an experienced Data Engineer to join our dynamic and... ...design and implement AI systems at Stord. This... ...closely with engineers, data scientists, and product managers... ...techniques, and model evaluation.* Experience with...
Remote job
Stord Inc.
New York, NY
1 day ago
AI Data Innovation Engineer, Data Innovation and Tools Rationalization
$133.37k - $156.9k
...One. Job Description We are seeking a highly skilled AI Data Innovation Engineer to join the Data Innovation and Tools Rationalization... ...and reducing technology sprawl through disciplined tool evaluation and rationalization. Values | In addition to U.S. Bank...
Temporary work
Work at office
Local area
Remote work
Flexible hours
3 days per week
U.S. Bank
Minneapolis, MN
4 days ago
Senior Software Developer, AI Data Engineer
...Senior Software Developer – Ai Data Engineer Caseware is one of Canada's original Fintech... ...AI system signals (tracing, feedback, evaluation, and usage data) to support observability... ...AI systems, enabling offline testing, benchmarking, and continuous improvement of...
Local area
Remote work
Home office
Flexible hours
CaseWare
United States
1 day ago
Senior Machine Learning Engineer - VLM/LLM Evaluation
$204k - $259k
...Senior Machine Learning Engineer – VLM/LLM Evaluation Waymo is an autonomous driving... ...The mission of the Waymo AI Foundations team is to... ...end evaluation systems and benchmarks for Waymo Foundation models... ...Implement and extend large scale data and evaluation pipelines....
Full time
Temporary work
Remote work
Waymo
San Francisco, CA
2 days ago
Senior AI Data Engineer
$155k
...About the Team The Data Platform team sits... ...Databricks, to the semantic and AI layers that sit on top.... ...work for everyone - engineers, analysts, and business... ...trained, aligned and evaluated (RLHF, fine-tuning, prompt... ...local cost of labor benchmarks for each specific role,...
For contractors
Local area
Home office
Flexible hours
Scribd
San Francisco, CA
2 days ago

Do you want to receive more vacancies?

Subscribe and receive similar vacancies to Data Engineer/Data Scientist for AI Benchmark Evaluation [Remote]. Be the first to apply!