LLM / RAG Evaluation Engineer

Prophecy Technologies

Job Summary

We are seeking an experienced LLM / RAG Evaluation Engineer to design, implement, and scale evaluation frameworks for Large Language Models (LLMs) , Retrieval-Augmented Generation (RAG) systems, and agentic AI workflows . This role focuses on assessing quality, reliability, safety, robustness, and performance of production-grade Generative AI systems used in real-world applications.

Key Responsibilities

Design and execute LLM response evaluation pipelines , including automated and human-in-the-loop approaches
Evaluate RAG systems for retrieval accuracy, grounding, relevance, and hallucination detection
Build and apply evaluation metrics for agentic AI systems , including:
Multi-step reasoning
Tool usage
Planning and memory workflows
Develop Python-based evaluation frameworks , benchmarks, and testing utilities
Analyze model outputs, identify failure modes, and provide actionable insights to improve system performance
Define and track KPIs for Generative AI systems , covering quality, safety, robustness, and trustworthiness
Collaborate with ML engineers, researchers, and product teams to improve GenAI architectures
Validate and compare prompt strategies, retrieval strategies, and system designs
Clearly document evaluation methodologies, results, and recommendations for stakeholders

Required Skills & Experience

Strong proficiency in Python
Proven experience in LLM response evaluation (quality, coherence, accuracy, bias, hallucinations)
Hands-on experience with RAG systems and retrieval-based architectures
Understanding of agentic AI systems and multi-step reasoning workflows
Experience evaluating Generative AI systems in real or near-production environments
Knowledge of NLP fundamentals and LLM behavior
Experience with prompt engineering, prompt testing, and prompt evaluation

Preferred Skills

Experience with LLM orchestration frameworks (LangChain, LlamaIndex, etc.)
Familiarity with automated evaluation tools, benchmarks, and scoring frameworks
Experience designing or managing human evaluation workflows
Understanding of AI safety, reliability, bias, and trustworthiness principles
Prior experience evaluating production-grade GenAI systems

Nice to Have

Experience with vector databases and retrieval pipelines
Exposure to cloud-based AI platforms
Research or experimentation background in LLM evaluation and benchmarking

Apply

Vacancy posted 1 day ago

Similar jobs that could be interesting for youBased on the LLM / RAG Evaluation Engineer in Austin, TX vacancy

Lead Developer - GenAI & RAG Systems
...Job Title: Lead Developer - GenAI & RAG Systems Location : Austin, TX (Hybrid with... ...enterprise GenAI solutions Develop scalable LLM-based applications using embeddings, vector databases, and prompt engineering best practices Work with Azure...
Suggested
Contract work
Local area
Abode Tech Zone
Austin, TX
1 day ago
AI Engineer - Agentic Systems & RAG
...Job Title: AI Engineer - Agentic Systems & RAG Location : Austin/Charlotte/NYC Experience: 3-7 years Industry: Technology... ...-agent collaboration setups Solid understanding of LLMs, their fine-tuning strategies, and evaluation frameworks...
Suggested
Lorven Technologies
Austin, TX
4 days ago
Engineering Manager, Evaluation Platform
$168.56k - $231.77k
We’re looking for an Engineering Manager for our Evaluation Platform team to join Procore’s Construction Intelligence organization. In this... ...measurement, or observability platforms for LLM-based or agentic systems (RAG pipelines, multi-step agents, tool-use agents). Strong...
Suggested
Work at office
Local area
Immediate start
2 days per week
ProCore CPA
Austin, TX
1 day ago
TWDB - 26-84: Flood Management Evaluation Engineer/Project Manager (Engineer II - V/Project Man[...]
...beginning June 29, 2026. Please see for more information.CAPPS Recruit Support Job Description - TWDB - 26-84: Flood Management Evaluation Engineer/Project Manager (Engineer II - V/Project Manager II - IV) (00058090) Job Description Organization TWDB - 26-84: Flood...
Suggested
Contract work
For contractors
Work experience placement
Work at office
Local area
Flexible hours
Shift work
Night shift
Weekend work
Early shift
Centralized Accounting and Payroll/Personnel System
Austin, TX
4 days ago
AI-Driven QA & Automation Engineer for LLM/ML Testing
Robotics Process Automation, LLC is looking for a QA & Automation Engineer with experience in AI tools to develop test suites for multi-... ...in Austin, TX. The role involves automating regression tests for LLM-driven features. Candidates must have a strong background in automation...
Suggested
Hourly pay
Robotics Prcocess Automation, LLC
Austin, TX
2 days ago
Americas Business Process Re-Engineering Data Engineer
...Americas Business Process Re-Engineering Data Engineer Apple is where... ...tools (e.g., GitHub, Claude) and LLM-powered agents to accelerate... ...-technical terms Research and evaluate emerging data engineering... ...technologies (e.g., vector databases, RAG pipelines, AI-native data...
Apple
Austin, TX
1 day ago
Senior AI Engineer, Agentic Systems
...Teradata agent harness, build evaluation frameworks to measure agent quality... ...join a focused team of AI engineers within Teradata's AI Apps,... ...systems in production - not just RAG pipelines or chatbots. ~ Hands... ...to agent frameworks or LLM tooling - we value engineers who...
Permanent employment
Flexible hours
Teradata
Austin, TX
3 days ago
Senior Site Reliability Engineer
...a passionate Site Reliability Engineer to join our team in Dallas, TX... ...You dive deep into the data to evaluate the health of your systems,... ...grade AI/ML microservices (e.g., RAG pipelines, agentic systems) on... .... Experience leveraging AI/LLM platforms (e.g., Gemini, Braintrust...
Local area
Traveltechessentialist
Austin, TX
20 hours ago
Applied AI Health Data System Engineer-Senior Manager
$124k - $280k
...people in data and analytics engineering focus on leveraging advanced technologies... ...and build production-grade RAG pipelines, MCP connections,... ...sources for use in AI and LLM-powered solutions Manage daily... ...closely with team members. We evaluate these factors thoughtfully to...
Full time
H1b
PwC
Austin, TX
2 days ago
AI Software Engineer — Scalable LLM Systems (Hybrid)
Commerce.com Pty Ltd is seeking an AI Engineer in Austin to build and evolve AI capabilities that support a variety of business use cases. The role involves developing AI solutions including LLM-based systems, collaborating with product managers, and enhancing the quality...
Commerce.com Pty Ltd
Austin, TX
4 days ago
Sr. Managed Services Engineer - AI & Copilot
$150k - $185k
...offices or yours. Job Summary This Sr. Managed Services Engineer - AI & CoPilot is responsible for supporting, designing,... ...Azure OpenAI and applied AI solutions (prompting, embeddings, RAG patterns, evaluation). Build and maintain Copilot Studio solutions,...
Work experience placement
Work at office
Remote work
Worldwide
Home office
Flexible hours
SHI GmbH
Austin, TX
4 days ago
Sr Security Engineer (SOAR/Automation) Austin & San Antonio, TX
...Senior Security Engineer (SOAR/Automation) H-E-B is a leading innovator in technology,... ...and accelerate incident response. AI/LLM-Driven Automation Hands-on experience... ..., threats, and solutions; helps evaluate technologies that align with business goals...
Work experience placement
H-E-B
Austin, TX
20 hours ago
System Firmware Test Engineer
$75 - $80 per hour
...Job Title: System Firmware Test Engineer Duration: 12 Months Location: Austin, TX Work Type: Hybrid Pay Rate: $ 75.0... ...deploying AI applications, specifically autonomous agents, LLM-backed tools, or RAG architectures applied to engineering or testing pipelines....
Hourly pay
TekWissen ®
Austin, TX
1 day ago
QA Automation Engineer, Apple Store Online
...QA Automation Engineer, Apple Store Online Imagine what you could do here! The people here... ...~ Deep knowledge of Agentic AI and LLM driven test frameworks ~ Experience Testing... ...Experience developing LLM based automated evaluation frameworks ~3+ years of proven ability...
Apple
Austin, TX
20 hours ago
Systems Engineer 3
...electronics goods. Job Title: Systems Engineer 3 Duration: 12 Months Location:... ...benchmark itself, SoC and system. Evaluate system performance by executing industry... ...benchmarking AI products and using various LLM benchmarks. Competitive analysis as...
Temporary work
3 days per week
Tekwissen
Austin, TX
3 days ago
GenAI Application Engineer (Python & LLM Prompt Engineering)
...Job Title: GenAI Application Engineer (Python LLM Prompt Engineering) Location: Austin,... ...foundation in LLM prompt engineering, RAG pipelines, and vector database integration... ...or disability. All applicants will be evaluated solely on the basis of their ability,...
3 days per week
1 day per week
Yantran LLC
Austin, TX
20 hours ago
Formal Verification - AI/ML Engineer
...Verification team is seeking an AI/ML Engineer to work at the intersection of... ...- Developing and fine-tuning LLM-based systems tailored to... ...-augmented generation (RAG) pipelines, agentic tool-use frameworks... ...counterexample analysis. - Evaluating and integrating emerging AI/ML...
Apple Inc.
Austin, TX
2 days ago
QA & Automation Engineer with AI tools experience
QA & Automation Engineer with AI tools experience Location: Austin, TX Duration: 12Months+... ...Required Skills AI-aligned test suites LLM system ML model testing API/E2E testing... ...). Familiarity with agent reliability evaluation frameworks. Preferred Experience with automated...
Hourly pay
Permanent employment
Contract work
Local area
Robotics Prcocess Automation, LLC
Austin, TX
2 days ago
QA Test Engineering
...QA Engineer We are redefining quality engineering by integrating AI-powered testing and... ...testing and continuous quality improvements Evaluate and implement AI/ML-based testing... ...based testing frameworks ~ Knowledge of LLM-based testing use cases (prompt-based test...
Shift work
Georgia IT Inc
Austin, TX
3 days ago
Lead Cybersecurity Automation Engineer
$161k - $242k
...Date posted 03/18/2026 Category Engineering Hire Type Employee Job ID 16493... ...scripting. You are energized by emerging AI/LLM technologies and have a proven track record... ..., and response actions. Continuously evaluating and integrating emerging AI, detection,...
Remote work
Synopsys
Austin, TX
a month ago
Founding Engineer in Austin - ShiftRx
Overview Yo, are you a cracked engineer looking to join a team of people that are highly delusional and passionate about... ...AI applications, utilizing prompt-engineering, RAG, and writing automated LLM evaluations Strong desire to be an early-hire at a fast-growing startup...
Work at office
Relocation
WorksHub
Austin, TX
2 days ago
Staff Full Stack Engineer, GTM Systems
$214k - $245k
...Operations, you will work at the intersection of engineering, AI, and business operations, building... ...systems, including structuring prompts, evaluating outputs, and iterating on prompt... ...Familiarity with retrieval-augmented generation (RAG), vector databases, and embedding-based...
Full time
Work at office
Flexible hours
Upside
Austin, TX
1 day ago
Senior CX Systems Engineer, AI & Automatio
$130k - $150k
...Senior Full Stack Engineer, Support AI & Automation The Senior Full Stack Engineer, Support... ...& Automation Strategy: Research, evaluate, and build AI and automation... ...Demonstrated experience working with AI and LLM-based systems, including: Model integration...
Full time
Work at office
Flexible hours
Upside Services
Austin, TX
1 day ago
GTM Engineer
$160k - $180k
...the Role: We are looking for a GTM Engineer where you won’t just be using our tools... ...have a deep understanding of the LLM stack: prompting, retrieval (RAG), cognitive architectures, and... ...Haves: Expertise with LangSmith for evaluation and monitoring. Experience building...
Work at office
Flexible hours
LangChain
Austin, TX
1 day ago
Senior Automation Engineering
$108.8k - $136k
...ROLE We’re seeking a Senior IAM Automation Engineer to transform how Apex manages workforce... ..., and automation for managing AI agents, LLM API keys, service accounts, bot identities... ...innovation in the identity space - evaluate emerging tools and practices, establish CI...
Work from home
Apex Fintech Solutions UK Ltd.
Austin, TX
2 days ago
Junior Quality Assurance Engineer
...hands-on mentorship from experienced senior engineers, and direct exposure to AI-powered... ...Key Responsibilities • Assist in evaluating and interpreting system specifications to... ...Experience or strong interest in using AI/LLM tools to streamline documentation and test...
Full time
Temporary work
Internship
Work at office
Local area
Worldwide
Flexible hours
3 days per week
Salient Systems
Austin, TX
1 day ago
Senior QA Engineer (AI First)
...personalized. We're looking for a Senior QA Engineer who thrives in an AI-first engineering... ...scenario coverage. Continuously evaluate and implement AI-powered QA platforms to... ...Copilot, Testim, CodiumAI, Diffblue, or custom LLM-based tools. ~ Familiarity with writing...
Shift work
EverlyWell
Austin, TX
1 day ago
Database Engineer - RAG Platform Developer
...Database Engineer - RAG Platform Developer We're seeking a Database Engineer to architect and optimize our large-scale RAG (Retrieval-Augmented... ...concurrency workloads Experience with embedding models and LLM integration patterns Demonstrated experience building or...
Apple
Austin, TX
3 days ago
Senior Engineer - Data, Schema & Knowledge Systems
...responsibilities We are seeking a Senior Software Engineer to own and evolve core platform systems... ...ingestion, memory architecture, evaluation infrastructure, and gateway data management... ...scale an evaluation curation system for an LLM-as-judge framework, including versioned...
IBM Computing
Austin, TX
4 days ago
Chief Building Engineer
...Chief Building Engineer Austin, TX The Chief Engineer leads the Engineering team and oversees all building maintenance, repair,... ...budgets; control inventory and purchasing. Supervise, train, and evaluate Engineering staff; develop and implement performance goals and...
Contract work
For contractors
Work at office
Local area
Remote work
Weekend work
Lincoln Property Company
Austin, TX
1 day ago

Do you want to receive more vacancies?

Subscribe and receive similar vacancies to LLM / RAG Evaluation Engineer. Be the first to apply!