Data Engineer/Data Scientist for AI Benchmark Evaluation [Remote]
$50 per hourSaidGig
- Remote job
Role Overview
This position offers an exciting opportunity for experienced Software Engineers specializing in Data Engineering and Data Science to engage in benchmark-driven evaluation projects. You will work with production-like datasets, data pipelines, and data science tasks aimed at evaluating and enhancing the performance of advanced AI systems. The ideal candidate will possess a solid foundation in both data engineering and data science, with the capability to navigate data preparation, analysis, and model-related workflows in real-world codebases.
Key Responsibilities
- Work with structured and unstructured datasets to support SWE Bench-style evaluation tasks.
- Design, build, and validate data pipelines used in benchmarking and evaluation workflows.
- Perform data processing, analysis, feature preparation, and validation for data science use cases.
- Write, run, and modify Python code to process data and support experiments locally.
- Evaluate data quality, transformations, and outputs for correctness and reproducibility.
- Create clean, well-documented, and reusable data workflows suitable for benchmarking.
- Participate in code reviews to ensure high standards of code quality and maintainability.
- Collaborate with researchers and engineers to design challenging, real-world data engineering and data science tasks for AI systems.
Qualifications
- Minimum 3+ years of overall experience as a Data Engineer, Data Scientist, or Software Engineer (data-focused).
- Strong proficiency in Python for data engineering and data science workflows.
- Demonstrable experience with data processing, analysis, and model-related workflows.
- Solid understanding of machine learning and data science fundamentals.
- Experience working with structured and unstructured data.
- Ability to understand, navigate, and modify complex, real-world codebases.
- Experience writing readable, reusable, maintainable, and well-documented code.
- Strong problem-solving skills, including experience with algorithmic or data-intensive problems.
- Excellent spoken and written English communication skills.
Work Terms
- Commitments Required: At least 4 hours per day and a minimum of 20 hours per week with 4 hours of overlap with PST.
- Engagement Type: Contractor assignment (no medical/paid leave).
- Duration of Contract: 3 months (adjustable based on engagement).
Compensation
Compensation details will be discussed during the interview process.
Eligibility
- This is a fully remote position.
- Opportunity to work on cutting-edge AI projects with leading LLM companies.
$210k - $260k
...the best-in-class Voice AI models powering the... ...for a Senior Research Engineer to join our streaming... ...measuring the right things, benchmarking against the right... ...building and extending evaluation tooling and translating... ...scripts, work with data pipelines, and are comfortable...SuggestedRemote work- An AI technology startup is seeking a Benchmarking Specialist in Palo Alto to design and execute ML evaluation benchmarks. You'll work closely with the R&D team to define data standards and maintain documentation. The ideal candidate has experience in ML/LLM evaluation...SuggestedRemote jobFull timeImmediate start
- ...frontier model that solves AI's fundamental memory... ...with the fastest data processing engine on the market,... ...Stamirowska, a complexity scientist who created a team consisting... ...and execute rigorous benchmarks and define dataset... ..., you will build the evaluation infrastructure that...SuggestedPermanent employmentFull timeContract workImmediate startRemote work
- ...Applied Data Scientist, LLM Evaluation Introduction At Driver, we're building... ...a core compiler-like engine, a heavily asynchronous/distributed... ...layer for employees and AI agents alike to use in... ...and readability. Build benchmarking and experimentation infrastructure...SuggestedRemote workFlexible hours
- ...global provider of enterprise AI products and services, on a... ...proprietary AI Studio and AI Engines, the company helps drive the... ...Machine Learning Engineer / Data Scientist to build and deploy machine learning... ...to model development, evaluation, deployment, and monitoring—often...SuggestedFull timeH1bLocal areaRemote work
$48 per hour
...Description Job Description At Kelly® Engineering, we’re passionate about helping you... ...about this one? We’re seeking a Sr Data Engineer/ Scientist to work at a premier biotechnology... ...advanced analytics, machine learning, and AI initiatives across manufacturing and...Bi-weekly payHourly payFull timeTemporary workLocal areaShift work- ...Senior Research Scientist We believe that the... ...and Amsterdam. The Data Foundation and AI team within Plaid's... ...production serving, evaluation, and monitoring, enabling... ..., feature engineering workflows, and monitoring... ...optimizing for a single benchmark metric. In close...Work experience placementLocal areaRemote work
$150k - $200k
...Senior Data Engineer We are seeking a seasoned Senior Data Engineer... ...autonomy to define engineering benchmarks, mentor fellow engineers,... ...Lead data platform and vendor evaluations, guiding build vs. buy... ...support analytics, reporting, AI/ML, and operational decision...Remote workFlexible hoursNight shift- ...Senior Data Engineer At Inchcape, our vision is to have a connected... ...compliance. Research and evaluate new features and patterns in... ...recommendations for adoption, enabling an AI-driven data strategy.... ...self-service. Performance benchmarks and tuning reports...For contractorsLocal areaRemote workWorldwide
$1,000 per month
...Senior Data Engineer Spellbook is seeking a Senior Data Engineer to... ...both internal analytics and AI-driven product capabilities,... ...scheduling workflows. All candidate evaluations, interviews, and hiring... ...Spellbook uses industry benchmark data to establish compensation...Contract workRemote workFlexible hours$160k - $174k
...growing team of world-class engineering, operations, medical... ...through value-based, AI-driven precision diagnostic... ...the Team The BI & Data team at Cleerly provides... ...architecture and help evaluate trade-offs across build... ...and is aligned to market benchmarks. Candidates located in...Remote work$315k
We are looking for Research Engineers to build “gold standard” evaluations for catastrophic risks, in order to understand what AI Safety Level (ASL) to assign to models. Research leads on this team collaborate with engineers in one of our focus areas: CBRN, Cyber, Autonomy...Currently hiringWork at officeImmediate startHome officeVisa sponsorshipRelocation package$129.99k - $149.48k
...about turning complex data into actionable... ...insights? As a Health Data Scientist focused on AI & Clinical Data... ...This is a Science and Engineering and Technical... ...leadership in data design, evaluation strategy, and... ...ensure that datasets, benchmarks, and evaluation methods...Full timeWork at officeRemote work- ...Sinch makes it easy. Our AI-infused Super Network,... ...and optimize scalable data pipelines and modern data... ...Collaborate with data scientists, analysts, and product... ...Strong experience as a Data Engineer or in a similar... ...interviews designed to evaluate your skills, experience...Remote workWorldwideHome officeFlexible hours
$164.2k - $229.9k
...information, visit Analytics Engineer - Consumer Data Science Check out... ...closely with Data Scientists and members of... ...a big plus. Agentic AI-assisted development... ...and country location, benchmarked against similar stage... ...this information to evaluate your application for...For contractorsWork experience placementWork at officeRemote workFlexible hours$190k - $225k
...Expert Systems Engineer/Data Scientist Location US-VA-Chantilly ID 2026-83... ...readiness and capabilities of this client's AI technologies as a blend of systems... ...requirements into technical solutions, develop evaluation CONOPS, coordinate customer and...Full timeRemote work- ...EngrewLabs is an AI-native technology company focused on building intelligent automation, data platforms, and next-generation AI solutions... ...models (LLMs), data engineering, and scalable cloud infrastructure... ...solutions. * Research and evaluate emerging technologies in...Remote work
$50k
...They build tailored, data‑driven campaigns across... ...versioning, and cross‑client benchmarking A self‑service... ...Shopify, etc.) without engineering involvement A first‑... ...Agentic automation: AI agent orchestration pipelines... ...automation (we evaluate this directly) Marketing...- SME Careers seeks a remote R Engineer to review AI-generated content and generate high-quality data analysis. The role involves ensuring model integrity, optimizing AI performance, and developing training content. Ideal candidates will have 2+ years of experience in R...Remote jobContract work
- ...Senior Rust Full-Stack Engineer - AI Data & Infrastructure About the Role What if... ...data pipelines, annotation tooling, and evaluation systems used by leading AI research... ...AI/ML workflows, model training, or benchmarking pipelines Experience building distributed...Hourly payOngoing contractContract workFreelanceRemote workFlexible hours
- ...Description If you're a senior Data Engineer who thrives on precision,... ...how the next generation of AI systems reason about data infrastructure... ...training, annotation, or evaluating AI-generated technical... ...and responsibly. Support benchmarking efforts by evaluating model...Remote jobFor contractors
- ...AI Research Engineer / Data Scientist (LLM) - Mid-Senior Job location: Morristown NJ ( Tri state candidate ) Role Summary Own... ...and agentic workflows. You'll drive architecture and evaluation strategy, productionize services with reliability and guardrails...
- Airbnb, Inc. is hiring a Senior Staff Machine Learning Engineer, focusing on driving evaluation strategies and data infrastructure for CSxAI initiatives. This role... ...PhD in a relevant field, extensive experience in ML/AI systems, and strong leadership in technical...Remote job
$30 - $60 per hour
Mercor is seeking detail-oriented generalists to support data quality control and annotation projects with leading AI labs. This part-time role involves reviewing, evaluating, and labeling data outputs to benchmark and improve AI models. The ideal candidate will be...Remote jobHourly payPart timeImmediate start10 hours per week- ...tech company is looking for a Senior Staff Machine Learning Engineer to drive ML evaluation for customer support initiatives. The ideal candidate will... ...collaborating with cross-functional teams, and enhancing AI systems. This position is remote eligible, requiring...Remote job
- ## Data EngineerApplylocations: Remote, United Statestime... ...an experienced Data Engineer to join our dynamic and... ...design and implement AI systems at Stord. This... ...closely with engineers, data scientists, and product managers... ...techniques, and model evaluation.* Experience with...Remote job
$133.37k - $156.9k
...One. Job Description We are seeking a highly skilled AI Data Innovation Engineer to join the Data Innovation and Tools Rationalization... ...and reducing technology sprawl through disciplined tool evaluation and rationalization. Values | In addition to U.S. Bank...Temporary workWork at officeLocal areaRemote workFlexible hours3 days per week- ...Senior Software Developer – Ai Data Engineer Caseware is one of Canada's original Fintech... ...AI system signals (tracing, feedback, evaluation, and usage data) to support observability... ...AI systems, enabling offline testing, benchmarking, and continuous improvement of...Local areaRemote workHome officeFlexible hours
$204k - $259k
...Senior Machine Learning Engineer – VLM/LLM Evaluation Waymo is an autonomous driving... ...The mission of the Waymo AI Foundations team is to... ...end evaluation systems and benchmarks for Waymo Foundation models... ...Implement and extend large scale data and evaluation pipelines....Full timeTemporary workRemote work$155k
...About the Team The Data Platform team sits... ...Databricks, to the semantic and AI layers that sit on top.... ...work for everyone - engineers, analysts, and business... ...trained, aligned and evaluated (RLHF, fine-tuning, prompt... ...local cost of labor benchmarks for each specific role,...For contractorsLocal areaHome officeFlexible hours
Do you want to receive more vacancies?
Subscribe and receive similar vacancies to Data Engineer/Data Scientist for AI Benchmark Evaluation [Remote]. Be the first to apply!


