Sign up to access all features of our service.
  • Job search
  • Favorites
  • Create a CV
    New
  • Salaries
  • Subscriptions

Senior Software Engineer for LLM Evaluation [Remote]

$40 per hour

SaidGig

Remote
  • Remote job

Role Overview: This position focuses on building LLM evaluation and training datasets aimed at addressing realistic software engineering challenges. The role involves creating verifiable software engineering tasks based on public repository histories, utilizing a synthetic approach with human-in-the-loop methodologies, while expanding dataset coverage across various programming languages and difficulty levels.

Key Responsibilities:

  • Analyze and triage GitHub issues across trending open-source libraries.
  • Set up and configure code repositories, including Dockerization and environment setup.
  • Evaluate unit test coverage and quality.
  • Modify and run codebases locally to assess LLM performance in bug-fixing scenarios.
  • Collaborate with researchers to design and identify repositories and issues that present challenges for LLMs.
  • Lead a team of junior engineers to collaborate on projects.

Qualifications:

  • Strong experience with at least one of the following languages: Python, JavaScript, Java, Go, Rust, C/C++, C#, or Ruby.
  • Proficiency with Git, Docker, and basic software pipeline setup.
  • Ability to understand and navigate complex codebases.
  • Comfortable running, modifying, and testing real-world projects locally.
  • Experience contributing to or evaluating open-source projects is a plus.

Nice to Have:

  • Previous participation in LLM research or evaluation projects.
  • Experience building or testing developer tools or automation agents.

Work Terms:

  • Commitment required: 20 hours per week with some overlap with PST.
  • Employment type: Contractor assignment (no medical/paid leave).
  • Duration of contract: 3 months with an expected start date next week.

Compensation: Competitive compensation commensurate with experience.

Eligibility:

  • This position is fully remote.
  • Open to candidates with the required skills and experience.
Vacancy posted 5 days ago
Similar jobs that could be interesting for youBased on the Senior Software Engineer for LLM Evaluation [Remote] in Remote vacancy
  • $183.34k - $206k

     ...behalf of a partner company. We are currently looking for a Senior Software Engineer – LLM Observability in the United States. Join a highly...  ...acknowledge that Jobgether will process your personal data to evaluate your candidacy and share relevant information with the... 
    Senior
    Remote job
    Full time
    Home office

    jobgether

    United States
    6 days ago
  • $183.34k - $206k

     ...more about the team: The LLM Observability (LLM O11y) team...  .... Support and own your software in production. You'll participate...  ...'ll use daily. You'll mentor engineers, share knowledge, and...  ...developed your own frameworks for evaluating how to approach new technical... 
    Senior
    Work at office
    Local area
    Remote work
    Work from home
    Home office
    Visa sponsorship
    Flexible hours
    Shift work

    Honey Comb

    United States
    5 days ago
  • $40 - $100 per hour

     ...Remote Senior Software Engineer (LLM) - 34953Remote Senior Software Engineer (LLM) - 349533 days ago Be among the first 25 applicantsGet AI-powered...  ...frontier AI.Project Overview:We're building high-quality evaluation and training datasets to improve how Large Language... 
    Senior
    Full time
    For contractors
    Remote work

    Turing Inc

    New York, NY
    2 days ago
  • $175k - $245k

     ...Senior Software Engineer II - Applied AI and Evaluations (Remote Eligible) For over 20 years, Smartsheet has helped people and teams achieve–well, anything....  ...technical, high-autonomy position at the intersection of LLM evaluation, prompt and context engineering, and... 
    Senior
    Full time
    Temporary work
    Local area
    Immediate start
    Remote work
    Flexible hours

    Smartsheet

    United States
    4 days ago
  • $100 per hour

     ...system deployment, is looking for qualified Senior Software Engineers to assist in a one-time project to assist with their LLM training. Selected candidates will...  ...time spent is expected to be ~1 hour. * Evaluate and improve large language models by creating... 
    Senior
    Hourly pay
    Temporary work
    Remote work

    Aquent

    United States
    6 days ago
  • $80 - $100 per hour

     ...build the coding benchmarks and evaluation pipelines used to test frontier AI models on real software engineering work: Design coding...  ...experience designing and implementing LLM coding benchmarks and...  ...residence. Nice to Have Senior or Lead-level profile with a... 
    Senior
    Full time
    Contract work
    For contractors
    Remote work

    G2i Inc.

    United States
    2 days ago
  • $50 per hour

     ...Role Overview As a Software Engineering evaluator, you will play a crucial role in creating advanced datasets for training, benchmarking, and enhancing large language models. This position involves collaborating closely with researchers to curate code examples, provide... 
    Senior
    Remote job
    For contractors
    Flexible hours

    SaidGig

    Remote
    5 days ago
  • $50 per hour

     ...This role focuses on creating advanced datasets for training and evaluating large language models, collaborating closely with researchers to enhance AI-driven coding solutions. As a Software Engineering evaluator, you will curate code examples, provide precise solutions... 
    Senior
    Remote job
    For contractors
    Flexible hours

    SaidGig

    Remote
    5 days ago
  •  ...We are looking for a Senior Software Engineer to contribute to the development and evaluation of AI training data for a leading expert human data platform for AI agents...  ...in AI data production, RLHF, data annotation, or LLM evaluation projects. Excellent written and verbal... 
    Senior
    Remote work

    KAKE

    New York, NY
    2 days ago
  • $50 per hour

     ...Role Overview As a Software Engineering evaluator, you will create cutting-edge datasets for training, benchmarking, and advancing large language models, collaborating closely with researchers. This includes curating code examples, providing precise solutions, and making... 
    Senior
    Remote job
    Full time
    For contractors
    Flexible hours

    SaidGig

    Remote
    5 days ago
  •  ...training pipelines, plus top AI researchers who specialize in software engineering, logical reasoning, STEM, multilinguality, multimodality,...  ...pedigree. Project Overview What Does a Typical Day Look Like? Evaluate and refine AI-generated code across backend and frontend... 
    Senior
    Remote job
    For contractors
    Flexible hours

    Turing

    Chicago, IL
    11 hours ago
  •  ...Senior AI Engineer In Pre-training Evaluation Aleph Alpha Research's mission is to deliver category-defining AI innovation that enables open, accessible...  ...level performance. Your Profile Experience with LLM evaluation, benchmark design, evaluation dataset... 
    Senior
    Remote work
    Relocation
    Flexible hours

    Aleph Alpha

    United States
    4 days ago
  • $204k - $259k

     ...ground for the Waymo Driver. The Simulator Evaluation team faces the ultimate data challenge:...  ..."real"? We are looking for aSenior Software Engineer to build the metrics and systems that...  ...this hybrid role, you will report to a Senior Staff Software Engineering Manager and... 
    Senior
    Full time
    Remote work

    Waymo

    San Francisco, CA
    2 days ago
  • $152k - $241.5k

     ...inference? Join NVIDIA's TensorRT Edge-LLM team and help shape the next generation...  ...for automotive and robotics. We build the software stack that enables Large Language, Vision...  ...Computer Science, Electrical/Computer Engineering, or a closely related field. ~4+ years... 
    Senior
    Remote work

    NVIDIA

    Santa Clara, CA
    2 days ago
  • $204k - $259k

     ...Senior Software Engineer, Statistical Evaluation and Sampling Waymo is an autonomous driving technology company with the mission to be the world's most trusted driver. Since its start as the Google Self-Driving Car Project in 2009, Waymo has focused on building the... 
    Senior
    Full time
    Remote work

    Waymo

    San Francisco, CA
    4 days ago
  • $204k - $259k

     ...Senior Software Engineer, Quantitative Evaluations Waymo is an autonomous driving technology company with the mission to be the world's most trusted driver. Since its start as the Google Self-Driving Car Project in 2009, Waymo has focused on building the Waymo Driver... 
    Senior
    Full time
    Remote work

    Waymo

    Mountain View, CA
    5 days ago
  • $148k - $356.5k

     ...Senior Software Engineer, Metrics and Evaluation - Autonomous Vehicles page is loaded Senior Software Engineer, Metrics and Evaluation - Autonomous Vehicles Apply locations US, CA, Santa Clara US, GA, Remote US, NC, Remote US, WA, Remote US, DC, Remote time type Full... 
    Senior
    Full time
    Remote work

    NVIDIA

    Raleigh, NC
    2 days ago
  • $180k - $240k

     ...the next generation of powerful, meaningful products built with AI. Job Overview We’re seeking an exceptional Senior Software Engineer to join our LLM team. This role is focused on building and maintaining our LLM gateway service—a unified API platform that... 
    Senior
    Full time
    Remote work
    Easy work

    AssemblyAI

    New York, NY
    9 days ago
  • $171.6k - $302.2k

     ...Senior Software Engineer in Test, Evaluation We are looking for a quality-focused owner, excited to work from device to UI, partnering with critical Apple Partners delivering features, apps, and the operating systems themselves. You'll get to work at all levels, driving... 
    Senior
    Relocation

    Apple

    Seattle, WA
    3 days ago
  • $136k - $199.2k

     ...Autonomous Driving Software Architect General Motors is a global leader in advanced...  ...experiences. About the Organization The Evaluation team builds and evolves the evaluation...  ...-level results into clear feedback for engineering and leadership, and help accelerate... 
    Senior
    Remote work
    Relocation
    Relocation package
    Flexible hours

    General Motors

    United States
    3 days ago
  •  ...experiences. About the Organization The Evaluation team builds and evolves the evaluation...  ...-level results into clear feedback for engineering and leadership, and help accelerate...  ...to introspect autonomous driving software performance atinterfaces across the autonomy... 
    Senior
    Local area
    Remote work
    Work from home
    Relocation
    Relocation package
    Flexible hours

    General Motors

    Sunnyvale, CA
    5 days ago
  • $19 - $20 per hour

     ...A tech consulting firm is seeking a Senior Software Engineer specializing in Python to evaluate and validate LLM performance in real-world scenarios. This remote position involves analyzing GitHub issues, developing software solutions, and collaborating with researchers... 
    Senior
    Hourly pay
    For contractors
    Remote work

    Crossing Hurdles

    New York, NY
    5 days ago
  •  ...Texas Sports Academy Main is seeking a Software Engineer (AI-Forward) to build software for managing student records and AI tools. The role requires...  ..., proficiency in AI coding tools, and involvement in LLM-powered features. You'll work closely with founders, moving features... 
    Senior
    Remote work

    Texas Sports Academy Main

    New York, NY
    4 days ago
  • $60 per hour

     ...Mindrift AI Coding Agent Evaluation Specialist Mindrift connects specialists with project...  ...Not data labeling Not prompt engineering Not writing code from scratch - the...  ...a good fit for experienced developers, software engineers, and/or test automation specialists... 
    Senior
    Hourly pay
    Permanent employment
    Temporary work
    Part time
    Remote work

    Mind Rift

    United States
    2 days ago
  •  ...Senior Python Developer Join us at Provectus as part of a team...  ...technologies, cloud services, and data engineering, and we take pride in our...  ...and ship Python services and LLM features (including RAG,...  ...; Experience with LLM evaluation frameworks (RAGAS, custom metrics... 
    Senior
    Remote work
    Flexible hours

    Provectus

    United States
    4 days ago
  •  ...Role: Senior AI Engineer - Agentic Systems and LLM Client Location: Mason, OH 100% Remote Job Description: We are seeking a senior AI engineer to design...  ..., backend services, and cloud platforms Establish evaluation, reliability, and performance strategies (accuracy,... 
    Senior
    Remote work

    Vytwo

    Prosper, TX
    6 days ago
  •  ...Senior AI Engineer - LLM & Agentic Systems (Python) Remote Role Overview We are seeking a senior AI engineer to design and build...  ...APIs, backend services, and cloud platforms Establish evaluation, reliability, and performance strategies (accuracy,... 
    Senior
    Remote work

    RIT Solutions, Inc.

    Atlanta, GA
    5 days ago
  • $150k - $250k

     ...Senior AI Engineer, Agentic Evaluation & V&V Remote At Slingshot Aerospace, we're on a mission to make space...  ...powered by better data and smarter software. What You'll Be Launching As a...  ..., or autonomous workflows (e.g., LLM-based agents, planning agents, or reinforcement... 
    Senior
    Full time
    Currently hiring
    Remote work

    Slingshot Aerospace

    United States
    4 days ago
  •  ...range of activities such as: software development, data management,...  ...business needs. The AI Platform Engineer is a hands-on engineering...  ...platform services including the LLM gateway, model registry, RAG...  ...management, re-ranking, and evaluation. Tune retrieval quality against... 
    Senior
    Full time
    Part time
    Seasonal work
    Work at office
    Local area
    Remote work
    2 days per week

    Dormont Manufacturing Company

    Cincinnati, IA
    1 day ago
  • $125k - $191.7k

     ...driving’s most difficult problems: evaluating the performance of the autonomous driving software stack before it reaches public roads. As a software engineer on the Simulation Engine team, you...  ...reinforcement learning, gym environment, or LLM. Strong programming skills in... 
    Senior
    Remote work
    Flexible hours

    General Motors

    Sunnyvale, CA
    1 day ago

Do you want to receive more vacancies?

Subscribe and receive similar vacancies to Senior Software Engineer for LLM Evaluation [Remote]. Be the first to apply!