LLM / RAG Evaluation Engineer
Prophecy Technologies
Job Summary We are seeking an experienced LLM / RAG Evaluation Engineer to design, implement, and scale evaluation frameworks for Large Language Models (LLMs) , Retrieval-Augmented Generation (RAG) systems, and agentic AI workflows . This role focuses on assessing quality, reliability, safety, robustness, and performance of production-grade Generative AI systems used in real-world applications. Key Responsibilities
- Design and execute LLM response evaluation pipelines , including automated and human-in-the-loop approaches
- Evaluate RAG systems for retrieval accuracy, grounding, relevance, and hallucination detection
- Build and apply evaluation metrics for agentic AI systems , including:
- Multi-step reasoning
- Tool usage
- Planning and memory workflows
- Develop Python-based evaluation frameworks , benchmarks, and testing utilities
- Analyze model outputs, identify failure modes, and provide actionable insights to improve system performance
- Define and track KPIs for Generative AI systems , covering quality, safety, robustness, and trustworthiness
- Collaborate with ML engineers, researchers, and product teams to improve GenAI architectures
- Validate and compare prompt strategies, retrieval strategies, and system designs
- Clearly document evaluation methodologies, results, and recommendations for stakeholders
- Strong proficiency in Python
- Proven experience in LLM response evaluation (quality, coherence, accuracy, bias, hallucinations)
- Hands-on experience with RAG systems and retrieval-based architectures
- Understanding of agentic AI systems and multi-step reasoning workflows
- Experience evaluating Generative AI systems in real or near-production environments
- Knowledge of NLP fundamentals and LLM behavior
- Experience with prompt engineering, prompt testing, and prompt evaluation
- Experience with LLM orchestration frameworks (LangChain, LlamaIndex, etc.)
- Familiarity with automated evaluation tools, benchmarks, and scoring frameworks
- Experience designing or managing human evaluation workflows
- Understanding of AI safety, reliability, bias, and trustworthiness principles
- Prior experience evaluating production-grade GenAI systems
- Experience with vector databases and retrieval pipelines
- Exposure to cloud-based AI platforms
- Research or experimentation background in LLM evaluation and benchmarking
Vacancy posted 1 day ago
Similar jobs that could be interesting for youBased on the LLM / RAG Evaluation Engineer in Austin, TX vacancy
- ...Job Title: Lead Developer - GenAI & RAG Systems Location : Austin, TX (Hybrid with... ...enterprise GenAI solutions Develop scalable LLM-based applications using embeddings, vector databases, and prompt engineering best practices Work with Azure...SuggestedContract workLocal area
- ...Job Title: AI Engineer - Agentic Systems & RAG Location : Austin/Charlotte/NYC Experience: 3-7 years Industry: Technology... ...-agent collaboration setups Solid understanding of LLMs, their fine-tuning strategies, and evaluation frameworks...Suggested
$168.56k - $231.77k
We’re looking for an Engineering Manager for our Evaluation Platform team to join Procore’s Construction Intelligence organization. In this... ...measurement, or observability platforms for LLM-based or agentic systems (RAG pipelines, multi-step agents, tool-use agents). Strong...SuggestedWork at officeLocal areaImmediate start2 days per week- ...beginning June 29, 2026. Please see for more information.CAPPS Recruit Support Job Description - TWDB - 26-84: Flood Management Evaluation Engineer/Project Manager (Engineer II - V/Project Manager II - IV) (00058090) Job Description Organization TWDB - 26-84: Flood...SuggestedContract workFor contractorsWork experience placementWork at officeLocal areaFlexible hoursShift workNight shiftWeekend workEarly shift
- Robotics Process Automation, LLC is looking for a QA & Automation Engineer with experience in AI tools to develop test suites for multi-... ...in Austin, TX. The role involves automating regression tests for LLM-driven features. Candidates must have a strong background in automation...SuggestedHourly pay
- ...Americas Business Process Re-Engineering Data Engineer Apple is where... ...tools (e.g., GitHub, Claude) and LLM-powered agents to accelerate... ...-technical terms Research and evaluate emerging data engineering... ...technologies (e.g., vector databases, RAG pipelines, AI-native data...
- ...Teradata agent harness, build evaluation frameworks to measure agent quality... ...join a focused team of AI engineers within Teradata's AI Apps,... ...systems in production - not just RAG pipelines or chatbots. ~ Hands... ...to agent frameworks or LLM tooling - we value engineers who...Permanent employmentFlexible hours
- ...a passionate Site Reliability Engineer to join our team in Dallas, TX... ...You dive deep into the data to evaluate the health of your systems,... ...grade AI/ML microservices (e.g., RAG pipelines, agentic systems) on... .... Experience leveraging AI/LLM platforms (e.g., Gemini, Braintrust...Local area
$124k - $280k
...people in data and analytics engineering focus on leveraging advanced technologies... ...and build production-grade RAG pipelines, MCP connections,... ...sources for use in AI and LLM-powered solutions Manage daily... ...closely with team members. We evaluate these factors thoughtfully to...Full timeH1b- Commerce.com Pty Ltd is seeking an AI Engineer in Austin to build and evolve AI capabilities that support a variety of business use cases. The role involves developing AI solutions including LLM-based systems, collaborating with product managers, and enhancing the quality...
$150k - $185k
...offices or yours. Job Summary This Sr. Managed Services Engineer - AI & CoPilot is responsible for supporting, designing,... ...Azure OpenAI and applied AI solutions (prompting, embeddings, RAG patterns, evaluation). Build and maintain Copilot Studio solutions,...Work experience placementWork at officeRemote workWorldwideHome officeFlexible hours- ...Senior Security Engineer (SOAR/Automation) H-E-B is a leading innovator in technology,... ...and accelerate incident response. AI/LLM-Driven Automation Hands-on experience... ..., threats, and solutions; helps evaluate technologies that align with business goals...Work experience placement
$75 - $80 per hour
...Job Title: System Firmware Test Engineer Duration: 12 Months Location: Austin, TX Work Type: Hybrid Pay Rate: $ 75.0... ...deploying AI applications, specifically autonomous agents, LLM-backed tools, or RAG architectures applied to engineering or testing pipelines....Hourly pay- ...QA Automation Engineer, Apple Store Online Imagine what you could do here! The people here... ...~ Deep knowledge of Agentic AI and LLM driven test frameworks ~ Experience Testing... ...Experience developing LLM based automated evaluation frameworks ~3+ years of proven ability...
- ...electronics goods. Job Title: Systems Engineer 3 Duration: 12 Months Location:... ...benchmark itself, SoC and system. Evaluate system performance by executing industry... ...benchmarking AI products and using various LLM benchmarks. Competitive analysis as...Temporary work3 days per week
- ...Job Title: GenAI Application Engineer (Python LLM Prompt Engineering) Location: Austin,... ...foundation in LLM prompt engineering, RAG pipelines, and vector database integration... ...or disability. All applicants will be evaluated solely on the basis of their ability,...3 days per week1 day per week
- ...Verification team is seeking an AI/ML Engineer to work at the intersection of... ...- Developing and fine-tuning LLM-based systems tailored to... ...-augmented generation (RAG) pipelines, agentic tool-use frameworks... ...counterexample analysis. - Evaluating and integrating emerging AI/ML...
- QA & Automation Engineer with AI tools experience Location: Austin, TX Duration: 12Months+... ...Required Skills AI-aligned test suites LLM system ML model testing API/E2E testing... ...). Familiarity with agent reliability evaluation frameworks. Preferred Experience with automated...Hourly payPermanent employmentContract workLocal area
- ...QA Engineer We are redefining quality engineering by integrating AI-powered testing and... ...testing and continuous quality improvements Evaluate and implement AI/ML-based testing... ...based testing frameworks ~ Knowledge of LLM-based testing use cases (prompt-based test...Shift work
$161k - $242k
...Date posted 03/18/2026 Category Engineering Hire Type Employee Job ID 16493... ...scripting. You are energized by emerging AI/LLM technologies and have a proven track record... ..., and response actions. Continuously evaluating and integrating emerging AI, detection,...Remote work- Overview Yo, are you a cracked engineer looking to join a team of people that are highly delusional and passionate about... ...AI applications, utilizing prompt-engineering, RAG, and writing automated LLM evaluations Strong desire to be an early-hire at a fast-growing startup...Work at officeRelocation
$214k - $245k
...Operations, you will work at the intersection of engineering, AI, and business operations, building... ...systems, including structuring prompts, evaluating outputs, and iterating on prompt... ...Familiarity with retrieval-augmented generation (RAG), vector databases, and embedding-based...Full timeWork at officeFlexible hours$130k - $150k
...Senior Full Stack Engineer, Support AI & Automation The Senior Full Stack Engineer, Support... ...& Automation Strategy: Research, evaluate, and build AI and automation... ...Demonstrated experience working with AI and LLM-based systems, including: Model integration...Full timeWork at officeFlexible hours$160k - $180k
...the Role: We are looking for a GTM Engineer where you won’t just be using our tools... ...have a deep understanding of the LLM stack: prompting, retrieval (RAG), cognitive architectures, and... ...Haves: Expertise with LangSmith for evaluation and monitoring. Experience building...Work at officeFlexible hours$108.8k - $136k
...ROLE We’re seeking a Senior IAM Automation Engineer to transform how Apex manages workforce... ..., and automation for managing AI agents, LLM API keys, service accounts, bot identities... ...innovation in the identity space - evaluate emerging tools and practices, establish CI...Work from home- ...hands-on mentorship from experienced senior engineers, and direct exposure to AI-powered... ...Key Responsibilities • Assist in evaluating and interpreting system specifications to... ...Experience or strong interest in using AI/LLM tools to streamline documentation and test...Full timeTemporary workInternshipWork at officeLocal areaWorldwideFlexible hours3 days per week
- ...personalized. We're looking for a Senior QA Engineer who thrives in an AI-first engineering... ...scenario coverage. Continuously evaluate and implement AI-powered QA platforms to... ...Copilot, Testim, CodiumAI, Diffblue, or custom LLM-based tools. ~ Familiarity with writing...Shift work
- ...Database Engineer - RAG Platform Developer We're seeking a Database Engineer to architect and optimize our large-scale RAG (Retrieval-Augmented... ...concurrency workloads Experience with embedding models and LLM integration patterns Demonstrated experience building or...
- ...responsibilities We are seeking a Senior Software Engineer to own and evolve core platform systems... ...ingestion, memory architecture, evaluation infrastructure, and gateway data management... ...scale an evaluation curation system for an LLM-as-judge framework, including versioned...
- ...Chief Building Engineer Austin, TX The Chief Engineer leads the Engineering team and oversees all building maintenance, repair,... ...budgets; control inventory and purchasing. Supervise, train, and evaluate Engineering staff; develop and implement performance goals and...Contract workFor contractorsWork at officeLocal areaRemote workWeekend work
Do you want to receive more vacancies?
Subscribe and receive similar vacancies to LLM / RAG Evaluation Engineer. Be the first to apply!


