Senior Software Engineer for LLM Evaluation [Remote]
$40 per hourSaidGig
- Remote job
Role Overview: This position focuses on building LLM evaluation and training datasets aimed at addressing realistic software engineering challenges. The role involves creating verifiable software engineering tasks based on public repository histories, utilizing a synthetic approach with human-in-the-loop methodologies, while expanding dataset coverage across various programming languages and difficulty levels.
Key Responsibilities:
- Analyze and triage GitHub issues across trending open-source libraries.
- Set up and configure code repositories, including Dockerization and environment setup.
- Evaluate unit test coverage and quality.
- Modify and run codebases locally to assess LLM performance in bug-fixing scenarios.
- Collaborate with researchers to design and identify repositories and issues that present challenges for LLMs.
- Lead a team of junior engineers to collaborate on projects.
Qualifications:
- Strong experience with at least one of the following languages: Python, JavaScript, Java, Go, Rust, C/C++, C#, or Ruby.
- Proficiency with Git, Docker, and basic software pipeline setup.
- Ability to understand and navigate complex codebases.
- Comfortable running, modifying, and testing real-world projects locally.
- Experience contributing to or evaluating open-source projects is a plus.
Nice to Have:
- Previous participation in LLM research or evaluation projects.
- Experience building or testing developer tools or automation agents.
Work Terms:
- Commitment required: 20 hours per week with some overlap with PST.
- Employment type: Contractor assignment (no medical/paid leave).
- Duration of contract: 3 months with an expected start date next week.
Compensation: Competitive compensation commensurate with experience.
Eligibility:
- This position is fully remote.
- Open to candidates with the required skills and experience.
$183.34k - $206k
...behalf of a partner company. We are currently looking for a Senior Software Engineer – LLM Observability in the United States. Join a highly... ...acknowledge that Jobgether will process your personal data to evaluate your candidacy and share relevant information with the...SeniorRemote jobFull timeHome office$183.34k - $206k
...more about the team: The LLM Observability (LLM O11y) team... .... Support and own your software in production. You'll participate... ...'ll use daily. You'll mentor engineers, share knowledge, and... ...developed your own frameworks for evaluating how to approach new technical...SeniorWork at officeLocal areaRemote workWork from homeHome officeVisa sponsorshipFlexible hoursShift work$40 - $100 per hour
...Remote Senior Software Engineer (LLM) - 34953Remote Senior Software Engineer (LLM) - 349533 days ago Be among the first 25 applicantsGet AI-powered... ...frontier AI.Project Overview:We're building high-quality evaluation and training datasets to improve how Large Language...SeniorFull timeFor contractorsRemote work$175k - $245k
...Senior Software Engineer II - Applied AI and Evaluations (Remote Eligible) For over 20 years, Smartsheet has helped people and teams achieve–well, anything.... ...technical, high-autonomy position at the intersection of LLM evaluation, prompt and context engineering, and...SeniorFull timeTemporary workLocal areaImmediate startRemote workFlexible hours$100 per hour
...system deployment, is looking for qualified Senior Software Engineers to assist in a one-time project to assist with their LLM training. Selected candidates will... ...time spent is expected to be ~1 hour. * Evaluate and improve large language models by creating...SeniorHourly payTemporary workRemote work$80 - $100 per hour
...build the coding benchmarks and evaluation pipelines used to test frontier AI models on real software engineering work: Design coding... ...experience designing and implementing LLM coding benchmarks and... ...residence. Nice to Have Senior or Lead-level profile with a...SeniorFull timeContract workFor contractorsRemote work$50 per hour
...Role Overview As a Software Engineering evaluator, you will play a crucial role in creating advanced datasets for training, benchmarking, and enhancing large language models. This position involves collaborating closely with researchers to curate code examples, provide...SeniorRemote jobFor contractorsFlexible hours$50 per hour
...This role focuses on creating advanced datasets for training and evaluating large language models, collaborating closely with researchers to enhance AI-driven coding solutions. As a Software Engineering evaluator, you will curate code examples, provide precise solutions...SeniorRemote jobFor contractorsFlexible hours- ...We are looking for a Senior Software Engineer to contribute to the development and evaluation of AI training data for a leading expert human data platform for AI agents... ...in AI data production, RLHF, data annotation, or LLM evaluation projects. Excellent written and verbal...SeniorRemote work
$50 per hour
...Role Overview As a Software Engineering evaluator, you will create cutting-edge datasets for training, benchmarking, and advancing large language models, collaborating closely with researchers. This includes curating code examples, providing precise solutions, and making...SeniorRemote jobFull timeFor contractorsFlexible hours- ...training pipelines, plus top AI researchers who specialize in software engineering, logical reasoning, STEM, multilinguality, multimodality,... ...pedigree. Project Overview What Does a Typical Day Look Like? Evaluate and refine AI-generated code across backend and frontend...SeniorRemote jobFor contractorsFlexible hours
- ...Senior AI Engineer In Pre-training Evaluation Aleph Alpha Research's mission is to deliver category-defining AI innovation that enables open, accessible... ...level performance. Your Profile Experience with LLM evaluation, benchmark design, evaluation dataset...SeniorRemote workRelocationFlexible hours
$204k - $259k
...ground for the Waymo Driver. The Simulator Evaluation team faces the ultimate data challenge:... ..."real"? We are looking for aSenior Software Engineer to build the metrics and systems that... ...this hybrid role, you will report to a Senior Staff Software Engineering Manager and...SeniorFull timeRemote work$152k - $241.5k
...inference? Join NVIDIA's TensorRT Edge-LLM team and help shape the next generation... ...for automotive and robotics. We build the software stack that enables Large Language, Vision... ...Computer Science, Electrical/Computer Engineering, or a closely related field. ~4+ years...SeniorRemote work$204k - $259k
...Senior Software Engineer, Statistical Evaluation and Sampling Waymo is an autonomous driving technology company with the mission to be the world's most trusted driver. Since its start as the Google Self-Driving Car Project in 2009, Waymo has focused on building the...SeniorFull timeRemote work$204k - $259k
...Senior Software Engineer, Quantitative Evaluations Waymo is an autonomous driving technology company with the mission to be the world's most trusted driver. Since its start as the Google Self-Driving Car Project in 2009, Waymo has focused on building the Waymo Driver...SeniorFull timeRemote work$148k - $356.5k
...Senior Software Engineer, Metrics and Evaluation - Autonomous Vehicles page is loaded Senior Software Engineer, Metrics and Evaluation - Autonomous Vehicles Apply locations US, CA, Santa Clara US, GA, Remote US, NC, Remote US, WA, Remote US, DC, Remote time type Full...SeniorFull timeRemote work$180k - $240k
...the next generation of powerful, meaningful products built with AI. Job Overview We’re seeking an exceptional Senior Software Engineer to join our LLM team. This role is focused on building and maintaining our LLM gateway service—a unified API platform that...SeniorFull timeRemote workEasy work$171.6k - $302.2k
...Senior Software Engineer in Test, Evaluation We are looking for a quality-focused owner, excited to work from device to UI, partnering with critical Apple Partners delivering features, apps, and the operating systems themselves. You'll get to work at all levels, driving...SeniorRelocation$136k - $199.2k
...Autonomous Driving Software Architect General Motors is a global leader in advanced... ...experiences. About the Organization The Evaluation team builds and evolves the evaluation... ...-level results into clear feedback for engineering and leadership, and help accelerate...SeniorRemote workRelocationRelocation packageFlexible hours- ...experiences. About the Organization The Evaluation team builds and evolves the evaluation... ...-level results into clear feedback for engineering and leadership, and help accelerate... ...to introspect autonomous driving software performance atinterfaces across the autonomy...SeniorLocal areaRemote workWork from homeRelocationRelocation packageFlexible hours
$19 - $20 per hour
...A tech consulting firm is seeking a Senior Software Engineer specializing in Python to evaluate and validate LLM performance in real-world scenarios. This remote position involves analyzing GitHub issues, developing software solutions, and collaborating with researchers...SeniorHourly payFor contractorsRemote work- ...Texas Sports Academy Main is seeking a Software Engineer (AI-Forward) to build software for managing student records and AI tools. The role requires... ..., proficiency in AI coding tools, and involvement in LLM-powered features. You'll work closely with founders, moving features...SeniorRemote work
$60 per hour
...Mindrift AI Coding Agent Evaluation Specialist Mindrift connects specialists with project... ...Not data labeling Not prompt engineering Not writing code from scratch - the... ...a good fit for experienced developers, software engineers, and/or test automation specialists...SeniorHourly payPermanent employmentTemporary workPart timeRemote work- ...Senior Python Developer Join us at Provectus as part of a team... ...technologies, cloud services, and data engineering, and we take pride in our... ...and ship Python services and LLM features (including RAG,... ...; Experience with LLM evaluation frameworks (RAGAS, custom metrics...SeniorRemote workFlexible hours
- ...Role: Senior AI Engineer - Agentic Systems and LLM Client Location: Mason, OH 100% Remote Job Description: We are seeking a senior AI engineer to design... ..., backend services, and cloud platforms Establish evaluation, reliability, and performance strategies (accuracy,...SeniorRemote work
- ...Senior AI Engineer - LLM & Agentic Systems (Python) Remote Role Overview We are seeking a senior AI engineer to design and build... ...APIs, backend services, and cloud platforms Establish evaluation, reliability, and performance strategies (accuracy,...SeniorRemote work
$150k - $250k
...Senior AI Engineer, Agentic Evaluation & V&V Remote At Slingshot Aerospace, we're on a mission to make space... ...powered by better data and smarter software. What You'll Be Launching As a... ..., or autonomous workflows (e.g., LLM-based agents, planning agents, or reinforcement...SeniorFull timeCurrently hiringRemote work- ...range of activities such as: software development, data management,... ...business needs. The AI Platform Engineer is a hands-on engineering... ...platform services including the LLM gateway, model registry, RAG... ...management, re-ranking, and evaluation. Tune retrieval quality against...SeniorFull timePart timeSeasonal workWork at officeLocal areaRemote work2 days per week
$125k - $191.7k
...driving’s most difficult problems: evaluating the performance of the autonomous driving software stack before it reaches public roads. As a software engineer on the Simulation Engine team, you... ...reinforcement learning, gym environment, or LLM. Strong programming skills in...SeniorRemote workFlexible hours
Do you want to receive more vacancies?
Subscribe and receive similar vacancies to Senior Software Engineer for LLM Evaluation [Remote]. Be the first to apply!
- software engineer internship remote Remote
- IT software developer Remote
- new grad software engineer Remote
- software engineer staff Remote
- machine learning software engineer Remote
- software engineer part time Remote
- facebook software engineer Remote
- senior robotics software engineer Remote
- junior software developer Remote
- software developer night shift Remote


