Sign up to access all features of our service.
  • Job search
  • Favorites
  • Create a CV
    New
  • Salaries
  • Subscriptions

Applied Data Scientist, LLM Evaluation United States (Remote) View Role

$175k - $275k

Driverai

Austin, TX
  • Remote job

Full-Time in Austin, TX Remote (any location) - Senior - Product & Engineering - $175k - $275k Applied Data Scientist, LLM Evaluation Introduction At Driver, we’re building systems that turn source code into human language. The tech stack includes a core compiler-like engine, a heavily asynchronous/distributed backend server, and a frontend web application that provides a rich user experience. About Driver We’re an early-stage startup backed by Y Combinator and Google Ventures that combines first principles technical approaches and applied LLM expertise to tackle context engineering at scale. Driver builds the context layer for employees and AI agents alike to use in developing software. Working at Driver Driver is an early-stage but fast-growing startup. As such, we take advantage of that which startups can excel: delivery speed, flexibility, and enjoying working with a small close-knit team. Organizational and engineering values at Driver include first-principles thinking, correct by construction, writing things down, experimentation and iteration, pragmatism, commitment to effective communication and transparency, autonomy, and ambition. Job Overview Title : Applied Data Scientist, LLM Evaluation Location: Remote or Austin, Tx Our value is directly tied to the quality of our content at scale. The platform generates technical documentation across a complex, multi-stage pipeline — producing multiple content types at different levels of abstraction, from individual code elements up to high-level summaries. Today, changes to models, context strategies, or pipeline architecture are evaluated largely through manual review and intuition. There is no systematic way to answer: “Did this change make our output better, worse, or the same — and for which languages, repo sizes, and content types?” This is a hard problem. LLM outputs are non-deterministic — identical inputs produce different outputs across runs, and small variations at early pipeline stages compound into meaningfully different end-user content downstream. Evaluating quality requires methodology that accounts for this: statistical reasoning over multiple runs, understanding of cascade effects through the pipeline, and rubrics that balance human judgment with automated signals. This role builds the evaluation function from scratch. You’ll define what “good” means for our generated content, build the infrastructure to measure it, and create the experimental framework that lets the team ship changes with confidence. What You’ll Do You’ll own the LLM evaluation strategy at Driver — from first principles to production infrastructure. This is a foundational role: you’re not joining an existing eval team, you’re building it. As the function matures, you’ll seed and grow a team around it. Define quality metrics and build evaluation datasets. Establish what “good” looks like for each content type across the pipeline. Build and curate gold-standard evaluation datasets across languages and repo archetypes (monorepos, microservices, libraries, applications). Design rubrics that capture accuracy, completeness, usefulness, and readability. Build benchmarking and experimentation infrastructure. Create automated evaluation pipelines that score output against reference datasets. Instrument the content generation pipeline to support A/B comparisons — run the same codebase through two strategies and compare results. Build tooling for LLM-as-judge evaluation and regression detection. Integrate evaluation into CI so pipeline changes come with quality evidence. Develop automated quality signals at scale. Build quality checks that flag degraded output without requiring human review of every document. Monitor content quality trends over time. Design sampling strategies for human review that maximize signal with minimal annotation effort. Quantify tradeoffs and inform decisions. Run experiments on model selection, context strategies, and pipeline architecture changes. Quantify cost/quality/latency tradeoffs. Partner with the engineering team to turn evaluation insights into shipped improvements. Qualifications Education: Bachelor’s, Master’s, or PhD in Statistics, Machine Learning, Data Science, Computational Linguistics, or a related quantitative field. Experience: Minimum 3 — 5 years in applied science, ML engineering, or data science roles with a focus on evaluation, NLP, or generative AI. 7+ years experience preferred. Required Technical Skills Strong statistical foundations: experimental design, hypothesis testing, confidence intervals, effect sizes, power analysis. Experience designing and running evaluations for LLM or NLP systems — you’ve thought carefully about what “better” means when outputs are open-ended text. Proficient in Python and the scientific/data stack (pandas, NumPy, scipy, sklearn). Comfortable working in Jupyter notebooks for exploration and prototyping, and turning that work into automated pipelines. Experience with LLM-as-judge approaches, inter-annotator agreement, and rubric design for subjective quality assessment. Familiarity with the practical challenges of non-deterministic systems: variance decomposition, multi-run methodology, distinguishing signal from noise at scale. Strong data storytelling — you can turn experiment results into clear recommendations that drive engineering and product decisions. Preferred and Nice-to-Have Technical Skills Experience with LLM APIs and prompt engineering across multiple providers. Familiarity with evaluation frameworks (e.g., RAGAS, DeepEval, custom harnesses). Experience building data pipelines or ETL workflows (Airflow, Dagster, or similar). Comfort with SQL and working directly against production data stores. Experience with visualization tools (Matplotlib, Plotly, Streamlit) for building internal dashboards and reports. Background in code understanding, developer tools, or technical documentation. Experience building or managing annotation pipelines and human evaluation workflows. Competitive Compensation Packages - Cash & Equity Flexible Work Culture Unlimited Time Off + 12 Paid Company Holidays Life Insurance & FSA Accounts 401(k) Retirement Accounts - Traditional, Roth, or Both Quarterly Team Offsites Driver is an equal opportunity employer. We celebrate diversity and are committed to creating an inclusive environment for all employees. We do not discriminate on the basis of race, religion, color, national origin, gender, sexual orientation, age, marital status, veteran status, or disability status. #J-18808-Ljbffr Driverai

Vacancy posted 2 days ago
Similar jobs that could be interesting for youBased on the Applied Data Scientist, LLM Evaluation United States (Remote) View Role in Austin, TX vacancy
  •  ...Data Science / Machine Learning Engineer (Remote, Continental United States) 3 weeks ago Be among...  ...clients with evaluating and achieving...  ...About the Role As a Data...  ...forefront of applying machine learning...  ...with data scientists, engineers,...  ...‑of‑the‑art LLM models and technologies... 
    Remote work
    Local area
    Flexible hours

    ICA Corporation

    Arlington, VA
    2 days ago
  • $176k - $207k

    Senior Data Engineer, Data Foundations & AI Platform United States (Remote) We Breathe Life Into Data...  ...largest, most complete view of the U.S....  ...Healthcare Map. This role transforms...  ..., inference, and evaluation. What You Bring...  ...productization. Applied AI / Agentic Workflows... 
    Remote job
    Local area
    Flexible hours

    Komodo Health Inc

    New York, NY
    3 days ago
  • $141k - $208k

     ...Engineer - Python and Data Ecosystem United States (Remote) About ClickHouse...  ...experience. About the role As a Senior Software...  ...Engineers and Data Scientists to harness...  ...pipelines, backends for LLM‑powered agents, and...  ...premium market range may apply, as listed. These salary... 
    Remote job
    Local area
    Worldwide
    Home office
    Flexible hours
    Shift work

    ClickHouse

    New York, NY
    4 days ago
  • $30 per hour

    A leading data analysis company is seeking an Applied Quantitative Analyst to join its team. The role involves training AI models, evaluating their performance, and solving...  ...-time or part-time remote work options,...  ...applicants located in the United States will be considered.... 
    Remote job
    Hourly pay
    Full time
    Part time

    DataAnnotation

    Hartford, CT
    2 days ago
  • Driverai is seeking an Applied Data Scientist with expertise in LLM evaluation to join its innovative team in Austin, TX. This role focuses on building the evaluation function from scratch and requires a strong background in statistics and machine learning. The successful... 
    Remote job

    Driverai

    Austin, TX
    2 days ago
  • ## Data Center Engineer - New AlbanyMaumee,Ohio,United StatesFind out how well you match with...  ...The person in this role is responsible for...  ..., break/fix, and remote hands services, utilizing...  ...happens once you apply?** Click Here to...  ...and city:** United States (US) || Ohio Remote... 
    Remote work
    Temporary work
    Work at office
    Immediate start

    Ericsson GmbH

    New Albany, OH
    3 days ago
  •  ...leading AI solutions company in the United States is seeking an Applied Quantitative Analyst to join their team. You will be responsible for evaluating the performance of AI chatbots and enhancing...  .... This position offers flexible remote work with hourly payments starting... 
    Remote job
    Hourly pay
    Flexible hours

    DataAnnotation

    New York, NY
    12 hours ago
  • Remote - United States Reddit is a community of communities. It...  ...the platform. This role requires deep...  ..., tags, attributes, LLM-based user profile),...  ...or strong intuition) applying LLMs or foundation models...  ...models: you consider data, training, evaluation, serving, and... 
    Remote job
    Immediate start
    Shift work

    Reddit, Inc.

    New York, NY
    2 days ago
  • $200k - $300k

    United States, Remote The salary range for this role is negotiable, the range being $200,000 - $300,000 per year. About...  ..., best‑in‑class Principal Data Engineer. This role presents an exciting...  ...robust, high‑quality datasets. Evaluate and integrate new technologies, guiding... 
    Remote job
    Work at office

    Sezzle

    New York, NY
    2 days ago
  • $173.1k - $303k

     ...development, including data curation, training, and evaluation. Our goal is...  ...Business Units (BUs) within...  ...do in this role: Confronted...  ...creativity to apply existing...  ...applied research scientists, product managers...  ...developing LLM based...  ...personas (flexible, remote, or required... 
    Remote work
    Work experience placement
    Work at office
    Flexible hours

    Victrays

    Santa Clara, CA
    4 days ago
  • $112k - $269k

     ...Additionally, you will apply traditional...  ...turning raw data into valuable...  ...is fully remote and does not...  ...particular state within the U....  ...LLMs, utilizing LLM APIs (OpenAI,...  ...and evaluation. A Bachelor’...  ...range for this role to be between...  ...restricted stock units, and benefits... 
    Remote job
    Work experience placement
    Local area

    Yelp

    San Francisco, CA
    3 days ago
  • $30 - $50 per hour

    AI Research Jobs in the United States (Remote, Full-Time) You will run applied AI research projects for US-based customers...  ...measurable experiments across LLM evaluation, RLHF data design, prompt evaluation, and...  ...Remote, FULL_TIME role supporting United States-based... 
    Remote job
    Hourly pay
    Full time

    Rex.zone

    New York, NY
    2 days ago
  • ## Data EngineerApplylocations: Remote, United Statestime type: Full timeposted on...  ...across the United States, Canada, and Europe...  ...unique opportunity to apply your knowledge and...  ...engineers, data scientists, and product managers...  ...techniques, and model evaluation.* Experience with... 
    Remote job

    Stord Inc.

    New York, NY
    1 day ago
  • $40 per hour

     ...are looking for a Data Scientist to join our team...  ...these AI chatbots, evaluate their logic, and...  ...model. In this role you will need to...  ...not limited to: Applied skills in Statistics...  ...time or part-time REMOTE position You’ll...  ...applicants in the United States will be considered... 
    Remote job
    Hourly pay
    Full time
    Contract work
    Part time

    DataAnnotation

    Annapolis, MD
    2 days ago
  • $250k - $350k

     ...Applied ML Systems Engineer  - Finance...  ...NEW YORK - UNITED STATES Salary...  ...yrs Remote Status - No Remote...  ...tests it against real data to see if the theory...  ...t a ticket-taking role. If you see a better...  ...backlog; they get evaluated and deployed,... 
    Remote work
    Permanent employment
    Full time
    Work experience placement
    Internship
    Immediate start
    Relocation
    Relocation package
    New York, NY
    23 days ago
  •  ...research with high-quality data, advanced training...  ...; and second, by applying that expertise to help...  ...Ideal Background This role is ideal for engineers...  ...Typical Day Look Like? Evaluate and refine AI-generated...  ...Candidates must be based in the United States #J-18808-Ljbffr... 
    Remote work
    For contractors
    Flexible hours

    Turing

    Chicago, IL
    4 days ago
  • $120.8k - $151k

    ### Data Engineer#### San Francisco, California, United StatesData Engineer**Why join us...  ...Data at Brex**Our Scientists and Engineers work...  ...also play a leading role in the design,...  ...per year of fully remote work!**Responsibilities...  ...the company.* Apply best practices in... 
    Remote work
    Work at office
    Work from home
    3 days per week

    Brex Inc.

    San Francisco, CA
    4 days ago
  • $139.5k - $258.1k

    Senior Applied Scientist - AI Evaluation & Quality Systems Seattle, Washington, United States Machine Learning and AI...  ...powers the AI and LLM features behind...  ...Human‑centered AI, Data Quality...  ...services. In this role, you will develop...  ...strong point of view on when not to use... 
    Relocation
    Shift work

    Apple Inc.

    Seattle, WA
    2 days ago
  • $105.7k - $149.28k

     ...The Senior Data Scientist of Responsible...  ...) team. This role embeds directly...  ...focuses on evaluation methodology,...  ...Generative AI or LLM-based systems...  ...to apply. All qualified...  ...by applicable state or local law....  ...***For remote and hybrid positions...  ...money news and views shaping how we... 
    Remote work
    16 hours
    Contract work
    Temporary work
    Part time
    Work experience placement
    Casual work
    Work at office
    Local area
    Work from home
    Work visa
    Flexible hours

    Empower

    Tucson, AZ
    1 day ago
  • $135k - $180k

     ...Engineer - Orchard Full‑time role reporting to the...  ...of Platform Product and Data. The position can be...  ...and Wednesday) or fully remote in Austin or Denver. Responsibilities...  .... Build and tune an LLM‑powered query layer on...  ...prompts, retrieval, evaluation, and a feedback loop... 
    Remote work
    Full time
    Shift work

    Orchard Technologies

    Denver, CO
    1 day ago
  • $30 per hour

    We are looking for an Applied Quantitative Analyst to...  ...of these AI chatbots, evaluate their logic, and solve...  ...each model. In this role you will need to hold...  ...Full-time or part-time remote position. You’ll be able...  ...applicants in the United States will be considered for... 
    Remote work
    Hourly pay
    Full time
    Contract work
    Part time

    DataAnnotation

    Hartford, CT
    12 hours ago
  • $170k - $215k

     ...Senior Data Scientist Company: Norstella Location: Remote, United States Date Posted: Apr 22, 20...  ...organization (Citeline, Evaluate, MMIT, Panalgo,...  ...industry. The Role Design and...  ...opportunities to apply AI/ML to our content...  ...frameworks for LLM outputs to ensure... 
    Remote work
    Full time
    Contract work
    Temporary work
    Work experience placement
    Local area
    Flexible hours

    Norstella

    Maryland
    4 days ago
  • $178.5k

     ...company and remote-first team of...  ...Your Team and Role Working on the Data Science Functional...  ...Senior Data Scientist, you'll help...  ...fine-tuning, LLM-as-judge, and...  ...based in the United States. This program...  ...the team. By applying for this role...  ...and evaluations of test projects... 
    Remote work
    Full time
    Work at office
    Local area
    Flexible hours

    DuckDuckGo

    New York, NY
    2 days ago
  • $94.9k - $135.6k

     ...platform analyzes data from...  ...Location: This role is remote and can be based...  ...anywhere within the United States. Candidates...  ...document‑level LLM extraction,...  ...agentic frameworks applied to EHR/EMR,...  ...stored procedures, views, and functions...  ..., prompting, evaluation, monitoring,... 
    Remote work
    Temporary work
    Local area
    Immediate start
    Flexible hours

    Cardinal Health

    Denver, CO
    1 day ago
  • $175k - $225k

     ...INOD) is a global data engineering company...  ...the data, evaluation frameworks, and human...  ...customers. Scope of the Role: Innodata is...  ...capability to advance state-of-the-art...  ...training methods for LLM and multimodal systems. As an Applied Research Scientist, LLM Evaluation &... 
    Full time
    Fixed term contract

    Innodata Inc.

    United States
    2 days ago
  •  ...seeking a Senior Data Scientist to join our...  ...experience in applied machine learning...  .... This role will design, build...  ...AI by building LLM-powered applications...  ..., or LLM evaluation frameworks....  ...computer and work remotely and...  ...to work in the United States without sponsorship... 
    Remote work
    Full time
    Temporary work
    Immediate start
    Work from home
    Home office

    GrabJobs

    United States
    1 day ago
  • $165.98k

     ...Join to apply for the Sr Data Scientist role at Ulta Beauty Join...  ...other business units regarding emerging...  ...Bolingbrook, IL. Can work remotely or telecommute...  ...any applicable state and local laws,...  .... Mountain View, CA $110,000.00-...  ...– Quality & LLM Judging Systems... 
    Remote work
    Full time
    Part time
    Local area
    Relocation
    Monday to Friday
    Shift work

    Ulta Beauty

    San Jose, CA
    4 days ago
  • $175k - $210k

     ...Johns Hopkins Applied Physics...  ...biology-aware data harmonization...  ...decisions The Role Help turn...  ...Data Scientist with a strong...  ...distributed, remote teams to ensure...  ...NLP and LLM solutions tailored...  ..., evaluation, or retrieval...  ...work in the United States Desired... 
    Remote work
    Temporary work

    BullFrog AI Inc

    Gaithersburg, MD
    1 day ago
  • $190k - $225k

     ...counterparts love, and this role is a key part of that...  ...this Role This Staff Data Product Developer will be...  ...first 6-9 months. You will apply software engineering...  ...quality assurance, and state-of-the-art methodologies...  ...culture, we allow up to 2 remote days per week. Our benefits... 
    Remote work
    Apprenticeship
    Work at office
    Local area
    2 days per week

    Sage Health

    New York, NY
    1 day ago
  •  ...TELUS Digital AI Data Solutions Ready to...  ...innovative web-based evaluation tool. A Day in the...  ...Analyst: In this role, you will be doing...  ...and intent by applying market expertise in...  ...a resident in the United States for the last Year...  ...Representative” roles. Remote Call... 
    Remote work
    16 hours
    Full time
    Part time
    Seasonal work
    Work at office
    Work from home

    TELUS Digital AI Data Solutions

    Kansas City, MO
    2 days ago

Do you want to receive more vacancies?

Subscribe and receive similar vacancies to Applied Data Scientist, LLM Evaluation United States (Remote) View Role. Be the first to apply!