Data Engineer for AI Benchmark Evaluation [Remote]
$50 per hourSaidGig
- Remote job
Role Overview
This role involves contributing to benchmark-driven evaluation projects that focus on real-world data engineering and data science workflows. As a Software Engineer specializing in Data Engineering and Data Science, you will engage in hands-on work with production-like datasets, data pipelines, and data science tasks to evaluate and enhance the performance of advanced AI systems. The ideal candidate will possess a strong foundation in both data engineering and data science, with the capability to navigate data preparation, analysis, and model-related workflows within real-world codebases.
Key Responsibilities
- Work with structured and unstructured datasets to support SWE Bench-style evaluation tasks.
- Design, build, and validate data pipelines used in benchmarking and evaluation workflows.
- Perform data processing, analysis, feature preparation, and validation for data science use cases.
- Write, run, and modify Python code to process data and support experiments locally.
- Evaluate data quality, transformations, and outputs for correctness and reproducibility.
- Create clean, well-documented, and reusable data workflows suitable for benchmarking.
- Participate in code reviews to ensure high standards of code quality and maintainability.
- Collaborate with researchers and engineers to design challenging, real-world data engineering and data science tasks for AI systems.
Qualifications
- Minimum 3+ years of overall experience as a Data Engineer, Data Scientist, or Software Engineer (data-focused).
- Strong proficiency in Python for data engineering and data science workflows.
- Demonstrable experience with data processing, analysis, and model-related workflows.
- Solid understanding of machine learning and data science fundamentals.
- Experience working with structured and unstructured data.
- Ability to understand, navigate, and modify complex, real-world codebases.
- Experience writing readable, reusable, maintainable, and well-documented code.
- Strong problem-solving skills, including experience with algorithmic or data-intensive problems.
- Excellent spoken and written English communication skills.
Work Terms
- Commitments Required: At least 4 hours per day and a minimum of 20 hours per week with 4 hours of overlap with PST.
- Engagement Type: Contractor assignment (no medical/paid leave).
- Duration of Contract: 3 months (adjustable based on engagement).
Compensation
Compensation details will be discussed during the interview process.
Eligibility
- This position is fully remote.
- Opportunity to work on cutting-edge AI projects with leading LLM companies.
- ...Applied Data Scientist, LLM Evaluation Introduction At Driver, we're building... ...a core compiler-like engine, a heavily asynchronous/distributed... ...layer for employees and AI agents alike to use in... ...and readability. Build benchmarking and experimentation infrastructure...SuggestedRemote workFlexible hours
$150k - $200k
...are seeking a seasoned Senior Data Engineer to architect, enhance, and... ...autonomy to define engineering benchmarks, mentor fellow engineers,... ...Lead data platform and vendor evaluations, guiding build vs. buy... ...support analytics, reporting, AI/ML, and operational decision...SuggestedRemote workFlexible hoursNight shift$1,000 per month
...Senior Data Engineer Spellbook is seeking a Senior Data Engineer to... ...both internal analytics and AI-driven product capabilities,... ...scheduling workflows. All candidate evaluations, interviews, and hiring... ...Spellbook uses industry benchmark data to establish compensation...SuggestedContract workRemote workFlexible hours$160k - $174k
...growing team of world-class engineering, operations, medical... ...through value-based, AI-driven precision diagnostic... ...the Team The BI & Data team at Cleerly... ...architecture and help evaluate trade-offs across build... ...and is aligned to market benchmarks. Candidates located...SuggestedRemote work- ...Senior Data Engineer At Inchcape, our vision is to have a connected... ...compliance. Research and evaluate new features and patterns in... ...recommendations for adoption, enabling an AI-driven data strategy.... ...self-service. Performance benchmarks and tuning reports...SuggestedFor contractorsLocal areaRemote workWorldwide
- ...Data Infrastructure Engineer (Rust) - High Performance Computing About the... ...powering the next generation of AI? We're looking for a... ..., annotation tooling, and evaluation systems that leading AI labs... ...workflows, model training, or benchmarking pipelines Experience...Hourly payOngoing contractContract workFreelanceRemote workFlexible hours
$164.2k - $229.9k
...information, visit Analytics Engineer - Consumer Data Science Check out our r/... ...is a big plus. Agentic AI-assisted development experience... ..., and country location, benchmarked against similar stage growth... ...use this information to evaluate your application for employment...For contractorsWork experience placementWork at officeRemote workFlexible hours$155k
...About the Team The Data Platform team sits... ...Databricks, to the semantic and AI layers that sit on top.... ...work for everyone - engineers, analysts, and business... ...trained, aligned and evaluated (RLHF, fine-tuning, prompt... ...local cost of labor benchmarks for each specific role,...For contractorsLocal areaHome officeFlexible hours$204k - $259k
...The mission of the Waymo AI Foundations team is to... ...learning, and robust evaluation. This role follows a... ...Senior Staff Software Engineer. You will: Work... ...evaluation systems and benchmarks for Waymo Foundation models... ...large large scale data and evaluation pipelines...Full timeTemporary workRemote work- ...Senior Software Developer – Ai Data Engineer Caseware is one of Canada's original Fintech... ...AI system signals (tracing, feedback, evaluation, and usage data) to support observability... ...AI systems, enabling offline testing, benchmarking, and continuous improvement of...Local areaRemote workHome officeFlexible hours
- ...leader in sustainability data for real estate-the... ...sustainability. Data and AI are at the center of... ...a Director of Engineering, Data & AI to lead the... ...across the organization Evaluate and adopt modern data... ...NLP), and intelligent benchmarking Champion the responsible...Local areaRemote workFlexible hours
$182k - $260k
...resilient, and secure. As an AI-forward enterprise ,... ...'s largest security data lake to power our cloud... ...a Principal GenAI Data Engineer to join our team. This... ...such as LangSmith, Evaluation Framework like Arize Phoenix... ...'s salary ranges are benchmarked and are determined by role...Full timeWork at officeLocal areaRemote work- ...Data Platform Engineer (Python) What if your Python expertise could directly... ...the world's most advanced AI systems? We're looking for... ...pipelines, annotation tooling, and evaluation infrastructure that leading... ..., model training, or benchmarking infrastructure...Hourly payOngoing contractContract workRemote workFlexible hours
- ...Data Platform Engineer (Python) What if your Python expertise could directly... ...powering next-generation AI? We're looking for a Senior... ...pipelines, annotation tooling, and evaluation systems that leading AI... ...support model training and benchmarking Participate in...Hourly payContract workFreelanceRemote workFlexible hours
$126.8k - $169k
...for a seasoned Senior/Lead Data Solution Engineer to join our vibrant team. This... ...predictive analytics and AI/ML approaches) to identify... ...scale with your growth. We benchmark roles against external... ...humans. We remain committed to evaluating candidates fairly,...Local areaImmediate startRemote workFlexible hours2 days per week$166.8k
...threats to our nation and the world. The AI and Data Analytics Division, part of NSD,... ...teams, we connect foundational research to engineering to operations, providing the tools to... ...and innovative training strategies) and evaluation (T&E, robustness) for key modalities...For contractorsWork experience placementWork at officeLocal areaRemote workRelocation packageFlexible hours- ...Autonomous Vehicle Metrics and Evaluation Data Scientist - Analytics Austin, TX About the... ...-driving systems. We work closely with engineering teams to analyze real-world and... ..., or to perform the essential functions of a job, please email ****@*****.***.ai....Remote workRelocation
- ...Python Infrastructure Engineer - Model Evaluation (AI Training) About the Role What if your Python... ...Engineer to design and build the data pipelines, annotation tooling, and evaluation... ...AI/ML workflows, model training, or benchmarking pipelines Experience with...Hourly payOngoing contractContract workFreelanceRemote workFlexible hours
$170k - $210k
...experience within our team. We are looking for a Lead Data Engineer to join our team. This is a high-impact, strategic role... ...of the Data Engineering team Explored and evaluated new tools—including AI-assisted coding platforms like Claude and Windsurf—to improve...Remote workWork visaShift work- ...Machine Learning / Data Science Engineer CapTech is an award-winning consulting firm that collaborates... ...engineering, MCP and RAG, and agentic AI architectures Strong understanding of conversational UX and prompt evaluation metrics Experience with agentic frameworks...Work at officeRemote workVisa sponsorshipWork visaFlexible hours
- ...Data Scientist / Machine Learning Engineer (Generative AI Focus) Strategic Staffing Solutions has an opening! This is a contract opportunity with our company... ...data processing, feature engineering, and model evaluation. Develop and evaluate machine learning models,...Contract workRemote workVisa sponsorship3 days per week
- ...Lead Specializing In Machine Learning And Data Engineering Digital products play a central role... ...: problem framing, approach selection, evaluation strategy, and iteration Data and... ...alerting) Familiarity with responsible AI and data privacy considerations (PII...Full time
- ...Analytics is seeking a highly motivated Data Scientist / Machine Learning Engineer to support our Department of... ...defense challenges through the power of AI. Responsibilities Design... ..., statistical analysis, and model evaluation to ensure high performance and...Full time
- ...through advanced technologies like AI, computer vision, and facial... ...Responsibilities Data Platform, Pipelines, & Quality... ...collection/labeling workflows and evaluation. Implement and optimize... ...pipelines from feature engineering to deployment and monitoring....Contract workLocal areaRemote workFlexible hours
- ...Senior Machine Learning Engineer, Data & Intelligence Products AcuityMD... ...technology. We're backed by Benchmark, Redpoint, ICONIQ Growth,... ...Health. We're a high-growth AI and Data company scaling... ...experimental design, and model evaluation — and you know when each is...For contractorsWork at officeRemote workWork from homeHome officeFlexible hours
- ...Machine Learning Engineering Manager Join the team redefining... ...the future of AI at scale. Your focus will... .... Owning the evaluation infrastructure - Design... ...-teaming, competitive benchmarking - to guarantee enterprise... ...Excel at creating data-driven evaluation methodologies...Remote workHome officeFlexible hours
$135k - $155k
...'re looking for a product-minded Senior Data Engineer to lead the buildout of a new, graph-backed... ...services (bonus). • Familiarity with AI-assisted development tools (e.g.,... ...current state of data infrastructure and evaluate graph database and entity resolution options...Remote work$130k - $170k
...You'll Do We're looking for a Senior Data Engineer to join our Retirement Modernization Data... ...that support analytics, reporting, and AI use cases Enable the end-to-end flow... ...intelligence tools to assist in reviewing and evaluating job applications, fraud prevention, and...Hourly payPermanent employmentTemporary workWork experience placementH1bWork at officeRemote workFlexible hours- ...Senior Data Engineer Edelman is a voice synonymous with trust, reimagining a future where... ...teams, and the application of Generative AI to real production workflows. You'll... ...ML and Product teams on prompt design, evaluation, and governance, ensuring responsible and...Remote work
$135k - $155k
...Data Engineer - Mid Location US-VA-Quantico ID 2026-4392 Category... ...areas of Information Technology, Test & Evaluation, Program Mission Support, Engineering &... ...of CD&I analytics, experimentation, and AI/ML initiatives. The engineer will work closely...Full timeFor contractorsRemote work
Do you want to receive more vacancies?
Subscribe and receive similar vacancies to Data Engineer for AI Benchmark Evaluation [Remote]. Be the first to apply!


