Sign up to access all features of our service.
  • Job search
  • Favorites
  • Create a CV
    New
  • Salaries
  • Subscriptions

Data Engineer/Data Scientist for AI Benchmark Evaluation [Remote]

$50 per hour

SaidGig

Remote
  • Remote job

Role Overview

This position offers an exciting opportunity for experienced Software Engineers specializing in Data Engineering and Data Science to engage in benchmark-driven evaluation projects. You will work with production-like datasets, data pipelines, and data science tasks aimed at evaluating and enhancing the performance of advanced AI systems. The ideal candidate will possess a solid foundation in both data engineering and data science, with the capability to navigate data preparation, analysis, and model-related workflows in real-world codebases.

Key Responsibilities

  • Work with structured and unstructured datasets to support SWE Bench-style evaluation tasks.
  • Design, build, and validate data pipelines used in benchmarking and evaluation workflows.
  • Perform data processing, analysis, feature preparation, and validation for data science use cases.
  • Write, run, and modify Python code to process data and support experiments locally.
  • Evaluate data quality, transformations, and outputs for correctness and reproducibility.
  • Create clean, well-documented, and reusable data workflows suitable for benchmarking.
  • Participate in code reviews to ensure high standards of code quality and maintainability.
  • Collaborate with researchers and engineers to design challenging, real-world data engineering and data science tasks for AI systems.

Qualifications

  • Minimum 3+ years of overall experience as a Data Engineer, Data Scientist, or Software Engineer (data-focused).
  • Strong proficiency in Python for data engineering and data science workflows.
  • Demonstrable experience with data processing, analysis, and model-related workflows.
  • Solid understanding of machine learning and data science fundamentals.
  • Experience working with structured and unstructured data.
  • Ability to understand, navigate, and modify complex, real-world codebases.
  • Experience writing readable, reusable, maintainable, and well-documented code.
  • Strong problem-solving skills, including experience with algorithmic or data-intensive problems.
  • Excellent spoken and written English communication skills.

Work Terms

  • Commitments Required: At least 4 hours per day and a minimum of 20 hours per week with 4 hours of overlap with PST.
  • Engagement Type: Contractor assignment (no medical/paid leave).
  • Duration of Contract: 3 months (adjustable based on engagement).

Compensation

Compensation details will be discussed during the interview process.

Eligibility

  • This is a fully remote position.
  • Opportunity to work on cutting-edge AI projects with leading LLM companies.
Vacancy posted 7 days ago
Similar jobs that could be interesting for youBased on the Data Engineer/Data Scientist for AI Benchmark Evaluation [Remote] in Remote vacancy
  • $210k - $260k

     ...the best-in-class Voice AI models powering the...  ...for a Senior Research Engineer to join our streaming...  ...measuring the right things, benchmarking against the right...  ...building and extending evaluation tooling and translating...  ...scripts, work with data pipelines, and are comfortable... 
    Suggested
    Remote work

    Remote Jobs

    New York, NY
    2 days ago
  • An AI technology startup is seeking a Benchmarking Specialist in Palo Alto to design and execute ML evaluation benchmarks. You'll work closely with the R&D team to define data standards and maintain documentation. The ideal candidate has experience in ML/LLM evaluation... 
    Suggested
    Remote job
    Full time
    Immediate start

    Pathway

    Palo Alto, CA
    3 days ago
  •  ...frontier model that solves AI's fundamental memory...  ...with the fastest data processing engine on the market,...  ...Stamirowska, a complexity scientist who created a team consisting...  ...and execute rigorous benchmarks and define dataset...  ..., you will build the evaluation infrastructure that... 
    Suggested
    Permanent employment
    Full time
    Contract work
    Immediate start
    Remote work

    Pathway

    Palo Alto, CA
    2 days ago
  •  ...Applied Data Scientist, LLM Evaluation Introduction At Driver, we're building...  ...a core compiler-like engine, a heavily asynchronous/distributed...  ...layer for employees and AI agents alike to use in...  ...and readability. Build benchmarking and experimentation infrastructure... 
    Suggested
    Remote work
    Flexible hours

    Driver AI Inc.

    United States
    2 days ago
  •  ...global provider of enterprise AI products and services, on a...  ...proprietary AI Studio and AI Engines, the company helps drive the...  ...Machine Learning Engineer / Data Scientist to build and deploy machine learning...  ...to model development, evaluation, deployment, and monitoring—often... 
    Suggested
    Full time
    H1b
    Local area
    Remote work

    GrabJobs

    United States
    16 hours ago
  • $48 per hour

     ...Description Job Description At Kelly® Engineering, we’re passionate about helping you...  ...about this one? We’re seeking a  Sr Data Engineer/ Scientist to work at a premier biotechnology...  ...advanced analytics, machine learning, and AI initiatives across manufacturing and... 
    Bi-weekly pay
    Hourly pay
    Full time
    Temporary work
    Local area
    Shift work

    Kelly Services

    Puerto Rico
    3 days ago
  •  ...Senior Research Scientist We believe that the...  ...and Amsterdam. The Data Foundation and AI team within Plaid's...  ...production serving, evaluation, and monitoring, enabling...  ..., feature engineering workflows, and monitoring...  ...optimizing for a single benchmark metric. In close... 
    Work experience placement
    Local area
    Remote work

    Plaid

    United States
    4 days ago
  • $150k - $200k

     ...Senior Data Engineer We are seeking a seasoned Senior Data Engineer...  ...autonomy to define engineering benchmarks, mentor fellow engineers,...  ...Lead data platform and vendor evaluations, guiding build vs. buy...  ...support analytics, reporting, AI/ML, and operational decision... 
    Remote work
    Flexible hours
    Night shift

    Ursa Space Systems Inc

    United States
    5 days ago
  •  ...Senior Data Engineer At Inchcape, our vision is to have a connected...  ...compliance. Research and evaluate new features and patterns in...  ...recommendations for adoption, enabling an AI-driven data strategy....  ...self-service. Performance benchmarks and tuning reports... 
    For contractors
    Local area
    Remote work
    Worldwide

    ISS Group

    United States
    2 days ago
  • $1,000 per month

     ...Senior Data Engineer Spellbook is seeking a Senior Data Engineer to...  ...both internal analytics and AI-driven product capabilities,...  ...scheduling workflows. All candidate evaluations, interviews, and hiring...  ...Spellbook uses industry benchmark data to establish compensation... 
    Contract work
    Remote work
    Flexible hours

    Spellbook

    United States
    2 days ago
  • $160k - $174k

     ...growing team of world-class engineering, operations, medical...  ...through value-based, AI-driven precision diagnostic...  ...the Team The BI & Data team at Cleerly provides...  ...architecture and help evaluate trade-offs across build...  ...and is aligned to market benchmarks. Candidates located in... 
    Remote work

    Cleerly, LLC

    New York, NY
    3 days ago
  • $315k

    We are looking for Research Engineers to build “gold standard” evaluations for catastrophic risks, in order to understand what AI Safety Level (ASL) to assign to models. Research leads on this team collaborate with engineers in one of our focus areas: CBRN, Cyber, Autonomy... 
    Currently hiring
    Work at office
    Immediate start
    Home office
    Visa sponsorship
    Relocation package

    Anthropic

    New York, NY
    2 days ago
  • $129.99k - $149.48k

     ...about turning complex data into actionable...  ...insights? As a Health Data Scientist focused on AI & Clinical Data...  ...This is a Science and Engineering and Technical...  ...leadership in data design, evaluation strategy, and...  ...ensure that datasets, benchmarks, and evaluation methods... 
    Full time
    Work at office
    Remote work

    Ripple Effect

    Washington DC
    2 days ago
  •  ...Sinch makes it easy. Our AI-infused Super Network,...  ...and optimize scalable data pipelines and modern data...  ...Collaborate with data scientists, analysts, and product...  ...Strong experience as a Data Engineer or in a similar...  ...interviews designed to evaluate your skills, experience... 
    Remote work
    Worldwide
    Home office
    Flexible hours

    Sinch

    United States
    3 days ago
  • $164.2k - $229.9k

     ...information, visit Analytics Engineer - Consumer Data Science Check out...  ...closely with Data Scientists and members of...  ...a big plus. Agentic AI-assisted development...  ...and country location, benchmarked against similar stage...  ...this information to evaluate your application for... 
    For contractors
    Work experience placement
    Work at office
    Remote work
    Flexible hours

    Reddit

    United States
    1 day ago
  • $190k - $225k

     ...Expert Systems Engineer/Data Scientist Location US-VA-Chantilly ID 2026-83...  ...readiness and capabilities of this client's AI technologies as a blend of systems...  ...requirements into technical solutions, develop evaluation CONOPS, coordinate customer and... 
    Full time
    Remote work

    Markon Solutions

    Chantilly, Loudoun County, VA
    8 days ago
  •  ...EngrewLabs is an AI-native technology company focused on building intelligent automation, data platforms, and next-generation AI solutions...  ...models (LLMs), data engineering, and scalable cloud infrastructure...  ...solutions. * Research and evaluate emerging technologies in... 
    Remote work

    EngrewLabs

    Saint Petersburg, FL
    3 days ago
  • $50k

     ...They build tailored, data‑driven campaigns across...  ...versioning, and cross‑client benchmarking A self‑service...  ...Shopify, etc.) without engineering involvement A first‑...  ...Agentic automation: AI agent orchestration pipelines...  ...automation (we evaluate this directly) Marketing... 

    Softline Solutions, Inc.

    Glendale, CA
    1 day ago
  • SME Careers seeks a remote R Engineer to review AI-generated content and generate high-quality data analysis. The role involves ensuring model integrity, optimizing AI performance, and developing training content. Ideal candidates will have 2+ years of experience in R... 
    Remote job
    Contract work

    SME Careers

    New Bremen, OH
    3 days ago
  •  ...Senior Rust Full-Stack Engineer - AI Data & Infrastructure About the Role What if...  ...data pipelines, annotation tooling, and evaluation systems used by leading AI research...  ...AI/ML workflows, model training, or benchmarking pipelines Experience building distributed... 
    Hourly pay
    Ongoing contract
    Contract work
    Freelance
    Remote work
    Flexible hours

    Alignerr

    New York, NY
    2 days ago
  •  ...Description If you're a senior Data Engineer who thrives on precision,...  ...how the next generation of AI systems reason about data infrastructure...  ...training, annotation, or evaluating AI-generated technical...  ...and responsibly. Support benchmarking efforts by evaluating model... 
    Remote job
    For contractors

    YO IT Consulting

    New York, NY
    1 day ago
  •  ...AI Research Engineer / Data Scientist (LLM) - Mid-Senior Job location: Morristown NJ ( Tri state candidate ) Role Summary Own...  ...and agentic workflows. You'll drive architecture and evaluation strategy, productionize services with reliability and guardrails... 

    Damco Solutions

    Morristown, NJ
    2 days ago
  • Airbnb, Inc. is hiring a Senior Staff Machine Learning Engineer, focusing on driving evaluation strategies and data infrastructure for CSxAI initiatives. This role...  ...PhD in a relevant field, extensive experience in ML/AI systems, and strong leadership in technical... 
    Remote job

    airbnb, Inc.

    San Francisco, CA
    2 days ago
  • $30 - $60 per hour

    Mercor is seeking detail-oriented generalists to support data quality control and annotation projects with leading AI labs. This part-time role involves reviewing, evaluating, and labeling data outputs to benchmark and improve AI models. The ideal candidate will be... 
    Remote job
    Hourly pay
    Part time
    Immediate start
    10 hours per week

    Mercor

    Carrollton, TX
    2 days ago
  •  ...tech company is looking for a Senior Staff Machine Learning Engineer to drive ML evaluation for customer support initiatives. The ideal candidate will...  ...collaborating with cross-functional teams, and enhancing AI systems. This position is remote eligible, requiring... 
    Remote job

    airbnb, Inc.

    New York, NY
    2 days ago
  • ## Data EngineerApplylocations: Remote, United Statestime...  ...an experienced Data Engineer to join our dynamic and...  ...design and implement AI systems at Stord. This...  ...closely with engineers, data scientists, and product managers...  ...techniques, and model evaluation.* Experience with... 
    Remote job

    Stord Inc.

    New York, NY
    1 day ago
  • $133.37k - $156.9k

     ...One. Job Description We are seeking a highly skilled AI Data Innovation Engineer to join the Data Innovation and Tools Rationalization...  ...and reducing technology sprawl through disciplined tool evaluation and rationalization. Values | In addition to U.S. Bank... 
    Temporary work
    Work at office
    Local area
    Remote work
    Flexible hours
    3 days per week

    U.S. Bank

    Minneapolis, MN
    4 days ago
  •  ...Senior Software Developer – Ai Data Engineer Caseware is one of Canada's original Fintech...  ...AI system signals (tracing, feedback, evaluation, and usage data) to support observability...  ...AI systems, enabling offline testing, benchmarking, and continuous improvement of... 
    Local area
    Remote work
    Home office
    Flexible hours

    CaseWare

    United States
    1 day ago
  • $204k - $259k

     ...Senior Machine Learning Engineer – VLM/LLM Evaluation Waymo is an autonomous driving...  ...The mission of the Waymo AI Foundations team is to...  ...end evaluation systems and benchmarks for Waymo Foundation models...  ...Implement and extend large scale data and evaluation pipelines.... 
    Full time
    Temporary work
    Remote work

    Waymo

    San Francisco, CA
    2 days ago
  • $155k

     ...About the Team The Data Platform team sits...  ...Databricks, to the semantic and AI layers that sit on top....  ...work for everyone - engineers, analysts, and business...  ...trained, aligned and evaluated (RLHF, fine-tuning, prompt...  ...local cost of labor benchmarks for each specific role,... 
    For contractors
    Local area
    Home office
    Flexible hours

    Scribd

    San Francisco, CA
    2 days ago

Do you want to receive more vacancies?

Subscribe and receive similar vacancies to Data Engineer/Data Scientist for AI Benchmark Evaluation [Remote]. Be the first to apply!