Applied Data Scientist, LLM Evaluation United States (Remote) View Role

$175k - $275k

Driverai

Remote job

Full-Time in Austin, TX Remote (any location) - Senior - Product & Engineering - $175k - $275k Applied Data Scientist, LLM Evaluation Introduction At Driver, we’re building systems that turn source code into human language. The tech stack includes a core compiler-like engine, a heavily asynchronous/distributed backend server, and a frontend web application that provides a rich user experience. About Driver We’re an early-stage startup backed by Y Combinator and Google Ventures that combines first principles technical approaches and applied LLM expertise to tackle context engineering at scale. Driver builds the context layer for employees and AI agents alike to use in developing software. Working at Driver Driver is an early-stage but fast-growing startup. As such, we take advantage of that which startups can excel: delivery speed, flexibility, and enjoying working with a small close-knit team. Organizational and engineering values at Driver include first-principles thinking, correct by construction, writing things down, experimentation and iteration, pragmatism, commitment to effective communication and transparency, autonomy, and ambition. Job Overview Title : Applied Data Scientist, LLM Evaluation Location: Remote or Austin, Tx Our value is directly tied to the quality of our content at scale. The platform generates technical documentation across a complex, multi-stage pipeline — producing multiple content types at different levels of abstraction, from individual code elements up to high-level summaries. Today, changes to models, context strategies, or pipeline architecture are evaluated largely through manual review and intuition. There is no systematic way to answer: “Did this change make our output better, worse, or the same — and for which languages, repo sizes, and content types?” This is a hard problem. LLM outputs are non-deterministic — identical inputs produce different outputs across runs, and small variations at early pipeline stages compound into meaningfully different end-user content downstream. Evaluating quality requires methodology that accounts for this: statistical reasoning over multiple runs, understanding of cascade effects through the pipeline, and rubrics that balance human judgment with automated signals. This role builds the evaluation function from scratch. You’ll define what “good” means for our generated content, build the infrastructure to measure it, and create the experimental framework that lets the team ship changes with confidence. What You’ll Do You’ll own the LLM evaluation strategy at Driver — from first principles to production infrastructure. This is a foundational role: you’re not joining an existing eval team, you’re building it. As the function matures, you’ll seed and grow a team around it. Define quality metrics and build evaluation datasets. Establish what “good” looks like for each content type across the pipeline. Build and curate gold-standard evaluation datasets across languages and repo archetypes (monorepos, microservices, libraries, applications). Design rubrics that capture accuracy, completeness, usefulness, and readability. Build benchmarking and experimentation infrastructure. Create automated evaluation pipelines that score output against reference datasets. Instrument the content generation pipeline to support A/B comparisons — run the same codebase through two strategies and compare results. Build tooling for LLM-as-judge evaluation and regression detection. Integrate evaluation into CI so pipeline changes come with quality evidence. Develop automated quality signals at scale. Build quality checks that flag degraded output without requiring human review of every document. Monitor content quality trends over time. Design sampling strategies for human review that maximize signal with minimal annotation effort. Quantify tradeoffs and inform decisions. Run experiments on model selection, context strategies, and pipeline architecture changes. Quantify cost/quality/latency tradeoffs. Partner with the engineering team to turn evaluation insights into shipped improvements. Qualifications Education: Bachelor’s, Master’s, or PhD in Statistics, Machine Learning, Data Science, Computational Linguistics, or a related quantitative field. Experience: Minimum 3 — 5 years in applied science, ML engineering, or data science roles with a focus on evaluation, NLP, or generative AI. 7+ years experience preferred. Required Technical Skills Strong statistical foundations: experimental design, hypothesis testing, confidence intervals, effect sizes, power analysis. Experience designing and running evaluations for LLM or NLP systems — you’ve thought carefully about what “better” means when outputs are open-ended text. Proficient in Python and the scientific/data stack (pandas, NumPy, scipy, sklearn). Comfortable working in Jupyter notebooks for exploration and prototyping, and turning that work into automated pipelines. Experience with LLM-as-judge approaches, inter-annotator agreement, and rubric design for subjective quality assessment. Familiarity with the practical challenges of non-deterministic systems: variance decomposition, multi-run methodology, distinguishing signal from noise at scale. Strong data storytelling — you can turn experiment results into clear recommendations that drive engineering and product decisions. Preferred and Nice-to-Have Technical Skills Experience with LLM APIs and prompt engineering across multiple providers. Familiarity with evaluation frameworks (e.g., RAGAS, DeepEval, custom harnesses). Experience building data pipelines or ETL workflows (Airflow, Dagster, or similar). Comfort with SQL and working directly against production data stores. Experience with visualization tools (Matplotlib, Plotly, Streamlit) for building internal dashboards and reports. Background in code understanding, developer tools, or technical documentation. Experience building or managing annotation pipelines and human evaluation workflows. Competitive Compensation Packages - Cash & Equity Flexible Work Culture Unlimited Time Off + 12 Paid Company Holidays Life Insurance & FSA Accounts 401(k) Retirement Accounts - Traditional, Roth, or Both Quarterly Team Offsites Driver is an equal opportunity employer. We celebrate diversity and are committed to creating an inclusive environment for all employees. We do not discriminate on the basis of race, religion, color, national origin, gender, sexual orientation, age, marital status, veteran status, or disability status. #J-18808-Ljbffr Driverai

Apply

Vacancy posted 2 days ago

Similar jobs that could be interesting for youBased on the Applied Data Scientist, LLM Evaluation United States (Remote) View Role in Austin, TX vacancy

Remote Applied Quantitative Analyst - AI Model Evaluator
$30 per hour
A leading data analysis company is seeking an Applied Quantitative Analyst to join its team. The role involves training AI models, evaluating their performance, and solving... ...-time or part-time remote work options,... ...applicants located in the United States will be considered....
Remote job
Hourly pay
Full time
Part time
DataAnnotation
Hartford, CT
2 days ago
Senior LLM Evaluation Data Scientist - Remote
Driverai is seeking an Applied Data Scientist with expertise in LLM evaluation to join its innovative team in Austin, TX. This role focuses on building the evaluation function from scratch and requires a strong background in statistics and machine learning. The successful...
Remote job
Driverai
Austin, TX
2 days ago
AI Enablement Engineer, IT McLean, Virginia; Mountain View, California, United States
$154.35k - $170.71k
...federal agencies, 45 state government... ...more, visit Role Overview We are... ...observability, evaluations, and guardrails... ...keep Security and Data close so... ...the frontier of applied AI: agents, MCP... ...Arrangement Mountain View, CA. The role... ...sales or other remote‑by‑design...
Remote work
Full time
Temporary work
Work at office
Flexible hours
ID.me
Mc Lean, VA
2 days ago
Applied Research Scientist, LLM Evaluation & Post-Training
...INOD) is a leading data engineering... ...employees in the United States, Canada, United Kingdom... ...state‑of‑the‑art evaluation and post‑training methods for LLM and multimodal systems. As an Applied Research Scientist, LLM Evaluation &... ...improvement. This role is ideal for a technically...
Suggested
Fixed term contract
Synodex
Ridgefield Park, NJ
2 days ago
Remote Applied Quantitative Analyst - AI Model Evaluator
...leading AI solutions company in the United States is seeking an Applied Quantitative Analyst to join their team. You will be responsible for evaluating the performance of AI chatbots and enhancing... .... This position offers flexible remote work with hourly payments starting...
Remote job
Hourly pay
Flexible hours
DataAnnotation
New York, NY
18 hours ago
Senior Analytics Engineer, Client Engagement New York, New York, United States
$180k - $200k
...Engagement New York, New York, United States About Us Sage is on a... ...love, and this role is a key part of that objective... ...teams, and the data architecture that drives... ...culture, we allow up to 2 remote days per week. Our... ...local laws. This policy applies to all employment practices...
Remote work
Apprenticeship
Work at office
Local area
Immediate start
2 days per week
Sagehealth
New York, NY
18 hours ago
Staff Applied Research Scientist at ServiceNow - Santa Clara, California , United States
$173.1k - $303k
...development, including data curation, training, and evaluation. Our goal is... ...Business Units (BUs) within... ...do in this role: Confronted... ...creativity to apply existing... ...applied research scientists, product managers... ...developing LLM based... ...personas (flexible, remote, or required...
Remote work
Work experience placement
Work at office
Flexible hours
Victrays
Santa Clara, CA
4 days ago
Remote AI Data Strategy Engineer & Applied Scientist
...Services is hiring an AI Data Strategy Engineer / Applied Scientist, LLM Data, to own data... ...workflows, and evaluation datasets powering... ...AI systems. The role covers acquisition... ...data generation. Remote work options available... .... #J-18808-Ljbffr United States Digital Space LLC
Remote job
United States Digital Space LLC
New York, NY
3 days ago
Senior Software Engineer - Data & Integrations New York, New York, United States
$175k - $210k
Senior Software Engineer - Data & Integrations New York, New York, United States Sage is on a mission to... ...love, and this role is a key part of that objective... ..., we allow up to 2 remote days per week. Our benefits... ...local laws. This policy applies to all employment...
Remote work
Apprenticeship
Work at office
Local area
2 days per week
Sagehealth
New York, NY
2 days ago
Applied ML Systems Engineer - Finance - NEW YORK - UNITED STATES
$250k - $350k
...Applied ML Systems Engineer - Finance... ...NEW YORK - UNITED STATES Salary... ...yrs Remote Status - No Remote... ...tests it against real data to see if the theory... ...t a ticket-taking role. If you see a better... ...backlog; they get evaluated and deployed,...
Remote work
Permanent employment
Full time
Work experience placement
Internship
Immediate start
Relocation
Relocation package
New York, NY
11 days ago
AI Research Scientist (United States, Remote)
$30 - $50 per hour
AI Research Jobs in the United States (Remote, Full-Time) You will run applied AI research projects for US-based customers... ...measurable experiments across LLM evaluation, RLHF data design, prompt evaluation, and... ...Remote, FULL_TIME role supporting United States-based...
Remote job
Hourly pay
Full time
Rex.zone
New York, NY
2 days ago
Remote Data Scientist for AI Models & Chatbot Evaluation
$40 per hour
...are looking for a Data Scientist to join our team... ...these AI chatbots, evaluate their logic, and... ...model. In this role you will need to... ...not limited to: Applied skills in Statistics... ...time or part-time REMOTE position You’ll... ...applicants in the United States will be considered...
Remote job
Hourly pay
Full time
Contract work
Part time
DataAnnotation
Annapolis, MD
2 days ago
Senior Applied Scientist - AI Evaluation & Quality Systems
$139.5k - $258.1k
Senior Applied Scientist - AI Evaluation & Quality Systems Seattle, Washington, United States Machine Learning and AI... ...powers the AI and LLM features behind... ...Human‑centered AI, Data Quality... ...services. In this role, you will develop... ...strong point of view on when not to use...
Relocation
Shift work
Apple Inc.
Seattle, WA
2 days ago
Remote Senior Python Engineer - LLM Evaluation (US-based)
...research with high-quality data, advanced training... ...; and second, by applying that expertise to help... ...Ideal Background This role is ideal for engineers... ...Typical Day Look Like? Evaluate and refine AI-generated... ...Candidates must be based in the United States #J-18808-Ljbffr...
Remote work
For contractors
Flexible hours
Turing
Chicago, IL
4 days ago
Senior Analytics Engineer Data & Analytics Remote United States
...Engineer to join Seismic’s Corporate Data & Analytics team and own our finance data domain. In this role, you’ll partner with our... ...directly with data analysts to evaluate reporting requirements and translate... ..., and source system changes. Apply software engineering best...
Remote job
Seismic
San Diego, CA
18 hours ago
Senior Manager, Data Engineering
$212k - $259k
...looking to apply your relevant... ...Manager, Data Engineering... ...team. In this role, you will... ...business units depend on.... ...technology evaluations, and proof-... ...experience with LLM-driven... ...the United States without sponsorship... ...: Remote -Boston, MA... ..., please view our Candidate...
Remote work
Daily paid
Local area
Jones Lang LaSalle IP, Inc.
United States
3 days ago
Applied Quantitative Analyst - AI Trainer
$30 per hour
We are looking for an Applied Quantitative Analyst to... ...of these AI chatbots, evaluate their logic, and solve... ...each model. In this role you will need to hold... ...Full-time or part-time remote position. You’ll be able... ...applicants in the United States will be considered for...
Remote work
Hourly pay
Full time
Contract work
Part time
DataAnnotation
Hartford, CT
18 hours ago
Manager, Data Engineering - United States
$155k - $183k
...affordable eyewear space, and data is at the core of our... .... In this leadership role, you will bridge the gap... ...move for you! This is a remote role overseeing a team... ...registered in some, but not all states. If you are located in... ...time, we welcome you to apply: Arizona, California,...
Remote work
Full time
Work from home
Flexible hours
Internal-Referrals
San Francisco, CA
1 day ago
Data/AI Engineer
$94.9k - $135.6k
...platform analyzes data from... ...Location: This role is remote and can be based... ...anywhere within the United States. Candidates... ...document‑level LLM extraction,... ...agentic frameworks applied to EHR/EMR,... ...stored procedures, views, and functions... ..., prompting, evaluation, monitoring,...
Remote work
Temporary work
Local area
Immediate start
Flexible hours
Cardinal Health
Austin, TX
1 day ago
Media Search Analyst - English (United States)
...TELUS Digital AI Data Solutions Ready to... ...innovative web-based evaluation tool. A Day in the... ...Analyst: In this role, you will be doing... ...and intent by applying market expertise in... ...a resident in the United States for the last Year... ...Representative” roles. Remote Call...
Remote work
16 hours
Full time
Part time
Seasonal work
Work at office
Work from home
TELUS Digital AI Data Solutions
Kansas City, MO
2 days ago
Annotation Data Scientist, Evaluation Integrity (Siri)
$154.6k - $274.9k
Annotation Data Scientist, Evaluation Integrity (Siri) Cambridge, Massachusetts, United States — Machine Learning and AI Play... ...grade the exchange. This role sits at the... ...and the reliability of LLM‑as‑judge and rule‑based... ...Evaluation team relies on. Apply data science rigor to...
Relocation
Apple Inc.
Cambridge, MA
1 day ago
Lead Data Product Engineer New York, New York, United States
$190k - $225k
...counterparts love, and this role is a key part of that... ...this Role This Staff Data Product Developer will be... ...first 6-9 months. You will apply software engineering... ...quality assurance, and state-of-the-art methodologies... ...culture, we allow up to 2 remote days per week. Our benefits...
Remote work
Apprenticeship
Work at office
Local area
2 days per week
Sage Health
New York, NY
1 day ago
Remote Senior Data Scientist
...seeking a Senior Data Scientist to join our... ...experience in applied machine learning... .... This role will design, build... ...AI by building LLM-powered applications... ..., or LLM evaluation frameworks.... ...computer and work remotely and... ...to work in the United States without sponsorship...
Remote work
Full time
Temporary work
Immediate start
Work from home
Home office
GrabJobs
Fremont, CA
4 days ago
Remote Applied Mathematician for AI Model Evaluation
A leading data services company is seeking an Applied Mathematician to evaluate AI models by providing complex mathematical... ...and performance. This role offers flexibility with fully remote work and allows you to... ...candidates from the United States will be considered. #J-1...
Remote job
Hourly pay
DataAnnotation
New York, NY
2 days ago
Senior Software Engineer, Product Data Platform Seattle, Washington, United States
$192k - $240k
...Software Engineer, Product Data Platform#### Seattle, Washington, United StatesSenior Software... ...*Where you’ll work**This role will be based in our Seattle... ...weeks per year of fully remote work!**What you’ll do**This... ..., and failure modes.* Evaluate existing architectures to...
Remote work
Work at office
Work from home
Brex Inc.
Seattle, WA
3 days ago
Sr. Product Data Scientist New York, New York, United States
$158k - $217k
About the Role: AlphaSense is seeking a highly analytical, entrepreneurial Data Analyst, Product Strategy & AI to serve as the analytical... ...which statistical tests to apply and how to interpret the noise... ...seamlessly with a highly technical remote data team (India) while acting...
Remote work
Tegus, Inc.
New York, NY
2 days ago
Principal Associate, Data Scientist - LLM Customization Team
$161.8k - $184.6k
...Principal Associate, Data Scientist - LLM Customization Team... ...business leaders to apply the state of the art in AI to our... ...technologies. In this role, you will: Partner... ...through training, evaluation, and validation; partnering... ...posted in the United Kingdom is for Capital...
Full time
Part time
Local area
Flexible hours
Capital One
McLean, VA
4 days ago
Senior Applied Data Scientist
$150k - $180k
...Job Description The Role We are looking to grow our core Applied Science team by adding a "Senior Applied Data Scientist". This is an individual... ...projects (prototyping, evaluation, productionization,... ...- $180k Location: Remote — United States Employment type: Full...
Remote work
Full time
Work experience placement
Flexible hours
3Play Media
Boston, MA
9 days ago
Data Scientist - Agentic AI Systems - Loops Palo Alto, California, United States Research and D[...]
$140k - $150k
Data Scientist - Agentic AI Systems - Loops We are seeking... ...agentic AI systems. This role blends deep analytical... ...Design and run applied research initiatives that... ...AI agents. Build and evaluate models for planning, memory... .... Experience with LLM post‑training methods...
For contractors
Flexible hours
Industrial and Financial Systems
Palo Alto, CA
2 days ago
LLM/ML Data Scientist — Community Support
$151k - $175k
United States Digital Space LLC is seeking a talented Data Scientist to join their Community Support Data Science team. This remote-eligible role requires a Master's or PhD and focuses on implementing advanced LLM/ML techniques to enhance customer experiences. The ideal...
Remote work
United States Digital Space LLC
New York, NY
3 days ago

Do you want to receive more vacancies?

Subscribe and receive similar vacancies to Applied Data Scientist, LLM Evaluation United States (Remote) View Role. Be the first to apply!