Remote Senior Python Engineer - LLM Evaluation (US-based)
Turing
About Us:
Based in San Francisco, California, Turing is the world’s leading research accelerator for frontier AI labs and a trusted partner for global enterprises deploying advanced AI systems. Turing supports customers in two ways: first, by accelerating frontier research with high-quality data, advanced training pipelines, plus top AI researchers who specialize in software engineering, logical reasoning, STEM, multilinguality, multimodality, and agents; and second, by applying that expertise to help enterprises transform AI from proof of concept into proprietary intelligence with systems that perform reliably, deliver measurable impact, and drive lasting results on the P&L.
Ideal Background:
This role is ideal for engineers who have shipped high-impact products at fast-moving companies like Stripe, Airbnb, Cloudflare, Datadog, Coinbase, or similar high-growth engineering environments. We especially welcome graduates from programs with strong CS foundations such as University of Washington, University of Illinois Urbana-Champaign, UT Austin, University of Michigan, Purdue, and comparable institutions — though exceptional experience and skill always take precedence over pedigree.
Project Overview:
As a Software Engineering evaluator, you will create cutting-edge datasets for training, benchmarking, and advancing large language models, collaborating closely with researchers. This includes curating code examples, providing precise solutions, and making corrections across the full stack — in Python for backend and ML workflows, and JavaScript (React, Node.js) for frontend and API layers, alongside C/C++, Java, Rust, and Go. You will evaluate and refine AI-generated code for efficiency, scalability, and reliability, and work with cross-functional teams to enhance enterprise-level AI-driven coding solutions.
What Does a Typical Day Look Like?
- Work on AI model training initiatives by curating code examples, building solutions, and correcting code across both Python and JavaScript (React, Node.js), with additional work in C/C++, Java, Rust, and Go.
- Evaluate and refine AI-generated code across backend and frontend contexts to ensure that it is efficient, scalable, and reliable.
- Collaborate with cross-functional teams to enhance AI-driven coding solutions against industry performance benchmarks.
- Build agents that can verify the quality of the code and identify error patterns across full-stack applications.
- Hypothesize on steps in the software engineering cycle (prototyping, architecture design, API design, production implementation, launch, experiments, monitoring, operational maintenance) and evaluate model capabilities on them.
- Design verification mechanisms that can automatically verify a solution to a software engineering task.
Required Skills:
- Several years of software engineering experience (3 years or more)
- Strong expertise in building full-stack applications using Python and JavaScript (React, Node.js), with the ability to work across backend and frontend codebases.
- Experience deploying scalable, production-grade software using modern languages and tools.
- Deep understanding of software architecture, design, development, debugging, and code quality/review assessment.
- Excellent oral and written communication skills for clear, structured evaluation rationales.
Engagement Details:
- Commitment: flexible engagement, minimum 10 hrs/week, up to 40 hrs/week
- Type: Contractor (no medical/paid leave)
- Duration: 1 month (potential extensions based on performance and fit)
- Location: Candidates must be based in the United States
Evaluation Process:
- The application process takes 15–30 minutes.
- Completion of an AI video interview is required.
Note: As part of assessments you will go through an AI video interview.
After applying, you will receive an email with a login link. Please use that link to access the portal and complete your profile.
Know amazing talent? Refer them at turing.com/referrals, and earn money from your network.
- ...About Us: Based in San Francisco, California, Turing is the world... ...who specialize in software engineering, logical reasoning, STEM, multilinguality... ...As a Software Engineering evaluator, you will create cutting-... ..., and making corrections in Python, C/C++, Rust, Go, Java, and...Remote workSeniorFor contractorsFlexible hours
$204k - $259k
...Senior Machine Learning Engineer – VLM/LLM Evaluation Waymo is an autonomous driving technology company... ...programming skills (eg: Python, C/C++) We prefer:... ...to all eligible US based employees. Benefits for... ...the role can be performed remote, the specific salary range...Remote workSeniorFull timeTemporary work$80 - $100 per hour
...locations. For US applicants:... ...and evaluation pipelines used... ...real software engineering work: Design... ...~ Expert Python — clean, performant... ...implementing LLM coding... ...to Have Senior or Lead-level... ...Location: Fully remote — work from anywhere... ...$80–$100/hr based on location...Remote workSeniorFull timeContract workFor contractors$19 - $20 per hour
...A tech consulting firm is seeking a Senior Software Engineer specializing in Python to evaluate and validate LLM performance in real-world scenarios. This remote position involves analyzing GitHub issues, developing software solutions, and collaborating with researchers...Remote workSeniorHourly payFor contractors- ...Senior AI Engineer - LLM & Agentic Systems (Python) Remote Role Overview We are seeking a senior AI engineer... ...and cloud platforms Establish evaluation, reliability, and performance... ...skills with experience building and scaling cloud-based applicationsRemote workSenior
- ...About Us At FunCodeNet, we are a global... ...connecting top-tier engineers with leading companies... ...**This is fully remote opportunities (candidates must be based in the US, Canada,... ...using Java and/or Python, Go * Develop high... ...*Nice to have** * LLM frameworks (LangChain...Remote workSeniorHourly payContract work
$213k - $263k
...Senior ML Engineer, LLM / VLM Distillation Waymo is an autonomous... ...learning, and robust evaluation. This role follows... ...experience. The expected base salary range for this... ...-time position across US locations is listed... ...role can be performed remote, the specific salary...Remote workSeniorFull time$204k - $259k
...sensors, enabling engineers like you to (1)... ...and rigorously evaluate metrics and methodologies... ...development (LLM, VLM, or similar... ...proficiency in Python and deep... ...The expected base salary range for... ...position across US locations is listed... ...can be performed remote, the specific salary...Remote workSeniorFull time- ...and team members based around the globe ,... ...world’s largest fully remote workforce . We... ...our Operations and Engineering teams to support the... ...platform enables us to deliver on-... ...customer-centric Senior Product Manager with... .... · Constantly re-evaluate your roadmap’s alignment...Remote workSeniorFull timeTemporary workFlexible hours
$80 per hour
...Very LLC is looking for a Senior Software Engineer to join their remote team in the United States. This role involves... ...will have extensive experience in Python backend development, microservices,... ...possibly approximating full-time hours based on performance. #J-18808-Ljbffr...Remote workSeniorHourly payFull timeContract work$184k - $287.5k
Senior ML Evaluation Engineer - Autonomous Vehicles page is loaded##... ...: US, CA, Santa Clara: US, GA, Remote: US, DC, Remote: US... ...transition from rule-based to learned evaluation... ...experience building LLM/VLM-based pipelines... ...reviewable code in Python and C++* Experience...Remote workSenior$60 - $70 per hour
...Machine Learning Engineer to join a high-... ...on advancing LLM evaluation, NLP, and AI-driven... ...: Mid-level to Senior Key... ...and build LLM-based evaluation frameworks... ...datasets using Python, SQL, and scalable... ...is a fully remote position. Application... ...partner with us for our scale,...Remote workContract workTemporary work3 days per week$200k - $225k
...re looking for a Senior Software Engineer to build and scale... ..., inference, and evaluation Partner closely... ...backend systems ML and LLM capabilities are... ...proficiency in Python and experience... ...Benefits The expected base salary range for... ...holidays Fully remote work within the...Remote workSeniorFlexible hours- ...staffing and recruiting agency that pairs remote work with top-tier talent. We help individuals... ..., bookings, and foot traffic for service-based businesses where conversion paths are... ...environments Benefits Remote Working for US Company Competitive Salary #J-18808-Ljbffr...Remote workSeniorLocal area
$127.2k - $209.8k
...and passion of all of us—from design and engineering to the manufacturing... ...Systems (DS) regional based US business, focusing... ...Contracting team, Senior Sales Leaders, Cross... ...and work-life balance. Remote or field-based positions... ...Employer. We evaluate applicants without regard...Remote workSeniorHourly payContract workWork at officeLocal areaWorldwideShift work- ...Senior Helpdesk Technician (US Based) We are seeking an experienced Senior Helpdesk Technician to join our Global IT team. In this role, you'll act as a trusted technical partner, delivering high-quality support to our distributed workforce and administering our core...Remote workSenior
- ...Moveworks’ Reasoning Engine and natural... ...talent to help us extend agentic AI... ...intelligent cloud-based platform... ...building and serving LLM’s at Moveworks.... ...models(LLM), model evaluation and monitoring... ...~ Expertise in Python and experience with... ...(flexible, remote, or required in...Remote workSeniorFull timeWork at officeFlexible hours
$119k - $179.75k
...Candidates must be a US Citizen or Green Card... ...This position is remote within the Greater Boston... .... We're looking for a Senior Python Engineer to join our ever evolving... ...reasonably expect to offer based on the role's... ...opportunity employer. We evaluate qualified applicants without...Remote workSeniorFull time$80 per hour
...specialists with project-based AI opportunities... ...on testing, evaluating, and improving AI... ...project is suited for a Senior Python developer with... ...experience as a Software Engineer (primarily Python)... ...understand with LLM many coding... ...Toloka AI) Fully remote and flexible participation...Remote workSeniorPermanent employmentTemporary workFreelanceFlexible hours$187k
...About the role Chime Engineering is growing rapidly... .... We’re hiring a Senior Python Core Engineer- to join... ..., packaging, base images, and shared... ...Python + AI ecosystem , evaluating tools, practices, and... ...necessary. #LI-Remote #LI-GC1 A little about us At Chime, we believe...Remote workSeniorFull timeWork at officeLocal areaNight shift- Senior Agentic AI Software Engineer - Hybrid US Job ID: 497243 Posted since:... ...-time, Hybrid (Remote/Office), Permanent... ...reliability, evaluation, and long-term... ...human-in-the-loop) based on problem... ...experience building LLM-powered... ...proficiency in Python (or similar agent...Remote workSeniorPermanent employmentFull timeWork at officeLocal areaWork from home
$170k - $260k
...clearance. Are you a Senior Python Software Engineer who is ready for a... ...then dropped off on a remote contract and never... ...principles. These provide us the capability to... ...manual system security evaluation and authorization... ...sustainment of various Python based REST end points,...Remote workSeniorFull timeContract workWork from homeRelocation packageShift work$55k - $151.47k
...people in data and analytics engineering focus on leveraging advanced... ...with PwC standards. As a Senior Associate you will analyze complex... ...platform Executing LLM evaluation frameworks using defined metrics... ...anticipated application deadlines: #LI-Remote #LI-Hybrid...Remote workSeniorFull timeWork experience placementH1b- ...more at As a Senior AI Infrastructure Engineer at Sword... ...From optimizing LLM inference and... ...strategies – evaluate and implement... ...Experience with Python, Go, or similar... ...control your hours (remotely) with... ...check here. US - Sword Benefits... ...EU visa and be based in Portugal...Remote workSeniorFull timeWork from homeWorldwideRelocation packageFlexible hoursShift work
- An innovative AI company based in the US is seeking a Mid-Senior level developer. The role... ...developing and maintaining evaluation servers, implementing... ...should have 4+ years of Python experience, solid API development... ...CLI. This part-time, remote opportunity offers...Remote workPart time
$175k - $245k
...Senior Software Engineer II - Applied AI and Evaluations (Remote Eligible) -REMOTE, USA- For over 20... ...the intersection of LLM evaluation, prompt... ...quality Strong Python skills; comfortable... ...and productivity US employees are automatically... ...a competitive base salary range for...Remote workSeniorFull timeTemporary workLocal areaImmediate start- ...Senior AI Python Software Engineer (Perm, USA, Remote + Travel) This is a full-time, permanent opportunity for candidates based in the United States and authorized to work... ...AI tools and building LLM-powered solutions in a... ...have. This step helps us assess initial alignment...Remote workSeniorPermanent employmentFull timeImmediate startFlexible hours
$144k - $244k
...Senior Client Executive Manufacturing (US based) Date: Feb 17, 2026 Company: NTT DATA Services NTT DATA strives to hire exceptional, innovative and passionate... ...Executive to join our team. This position will work remotely from your home office located within the Dallas area....Remote workSeniorTemporary workWork at officeWork from homeHome officeFlexible hours- ...customer success, product, and engineering. We are a diverse and... ...technology obsessed. Want to join us? Step 1: Apply & Showcase Your... ...Workload Enjoy the flexibility of remote work and select how much you want... ...connect with our New‑York based team, however a lot of our operations...Remote workSeniorContract workFreelanceWork at office
$99.6k - $174k
...Senior Full Stack Engineer, AI Platform & Agents Build... .... Location: US/Canada, Hybrid or Remote - Work Hours: Must... ....js, React, Python, LangChain/LangGraph... ...Apply current LLM patterns (RAG,... ...and evaluation ~ Backend development... ...range listed is based on primary...Remote workSeniorWork at office2 days per week
Do you want to receive more vacancies?
Subscribe and receive similar vacancies to Remote Senior Python Engineer - LLM Evaluation (US-based). Be the first to apply!
- python engineer Plano, TX
- python developer Plano, TX
- python programmer Plano, TX
- full stack / python developer (remote) Plano, TX
- python developer data analytics Plano, TX
- python developer remote Plano, TX
- senior python developer Plano, TX
- remote coding part time Plano, TX
- franchise development manager (remote) Plano, TX
- junior devops remote Plano, TX


