Sign up to access all features of our service.
  • Job search
  • Favorites
  • Create a CV
    New
  • Salaries
  • Subscriptions

LLM / RAG Evaluation Engineer

Prophecy Technologies

Job Summary

We are seeking an experienced LLM / RAG Evaluation Engineer to design, implement, and scale evaluation frameworks for Large Language Models (LLMs) , Retrieval-Augmented Generation (RAG) systems, and agentic AI workflows . This role focuses on assessing quality, reliability, safety, robustness, and performance of production-grade Generative AI systems used in real-world applications.

Key Responsibilities
  • Design and execute LLM response evaluation pipelines , including automated and human-in-the-loop approaches
  • Evaluate RAG systems for retrieval accuracy, grounding, relevance, and hallucination detection
  • Build and apply evaluation metrics for agentic AI systems , including:
  • Multi-step reasoning
  • Tool usage
  • Planning and memory workflows
  • Develop Python-based evaluation frameworks , benchmarks, and testing utilities
  • Analyze model outputs, identify failure modes, and provide actionable insights to improve system performance
  • Define and track KPIs for Generative AI systems , covering quality, safety, robustness, and trustworthiness
  • Collaborate with ML engineers, researchers, and product teams to improve GenAI architectures
  • Validate and compare prompt strategies, retrieval strategies, and system designs
  • Clearly document evaluation methodologies, results, and recommendations for stakeholders
Required Skills & Experience
  • Strong proficiency in Python
  • Proven experience in LLM response evaluation (quality, coherence, accuracy, bias, hallucinations)
  • Hands-on experience with RAG systems and retrieval-based architectures
  • Understanding of agentic AI systems and multi-step reasoning workflows
  • Experience evaluating Generative AI systems in real or near-production environments
  • Knowledge of NLP fundamentals and LLM behavior
  • Experience with prompt engineering, prompt testing, and prompt evaluation
Preferred Skills
  • Experience with LLM orchestration frameworks (LangChain, LlamaIndex, etc.)
  • Familiarity with automated evaluation tools, benchmarks, and scoring frameworks
  • Experience designing or managing human evaluation workflows
  • Understanding of AI safety, reliability, bias, and trustworthiness principles
  • Prior experience evaluating production-grade GenAI systems
Nice to Have
  • Experience with vector databases and retrieval pipelines
  • Exposure to cloud-based AI platforms
  • Research or experimentation background in LLM evaluation and benchmarking
Vacancy posted 1 day ago
Similar jobs that could be interesting for youBased on the LLM / RAG Evaluation Engineer in Austin, TX vacancy
  •  ...Job Title: Lead Developer - GenAI & RAG Systems Location : Austin, TX (Hybrid with...  ...enterprise GenAI solutions Develop scalable LLM-based applications using embeddings, vector databases, and prompt engineering best practices Work with Azure... 
    Suggested
    Contract work
    Local area

    Abode Tech Zone

    Austin, TX
    1 day ago
  •  ...Job Title: AI Engineer - Agentic Systems & RAG Location : Austin/Charlotte/NYC Experience: 3-7 years Industry: Technology...  ...-agent collaboration setups Solid understanding of LLMs, their fine-tuning strategies, and evaluation frameworks... 
    Suggested

    Lorven Technologies

    Austin, TX
    4 days ago
  • $168.56k - $231.77k

    We’re looking for an Engineering Manager for our Evaluation Platform team to join Procore’s Construction Intelligence organization. In this...  ...measurement, or observability platforms for LLM-based or agentic systems (RAG pipelines, multi-step agents, tool-use agents). Strong... 
    Suggested
    Work at office
    Local area
    Immediate start
    2 days per week

    ProCore CPA

    Austin, TX
    1 day ago
  •  ...beginning June 29, 2026. Please see for more information.CAPPS Recruit Support Job Description - TWDB - 26-84: Flood Management Evaluation Engineer/Project Manager (Engineer II - V/Project Manager II - IV) (00058090) Job Description Organization TWDB - 26-84: Flood... 
    Suggested
    Contract work
    For contractors
    Work experience placement
    Work at office
    Local area
    Flexible hours
    Shift work
    Night shift
    Weekend work
    Early shift

    Centralized Accounting and Payroll/Personnel System

    Austin, TX
    4 days ago
  • Robotics Process Automation, LLC is looking for a QA & Automation Engineer with experience in AI tools to develop test suites for multi-...  ...in Austin, TX. The role involves automating regression tests for LLM-driven features. Candidates must have a strong background in automation... 
    Suggested
    Hourly pay

    Robotics Prcocess Automation, LLC

    Austin, TX
    2 days ago
  •  ...Americas Business Process Re-Engineering Data Engineer Apple is where...  ...tools (e.g., GitHub, Claude) and LLM-powered agents to accelerate...  ...-technical terms Research and evaluate emerging data engineering...  ...technologies (e.g., vector databases, RAG pipelines, AI-native data... 

    Apple

    Austin, TX
    1 day ago
  •  ...Teradata agent harness, build evaluation frameworks to measure agent quality...  ...join a focused team of AI engineers within Teradata's AI Apps,...  ...systems in production - not just RAG pipelines or chatbots. ~ Hands...  ...to agent frameworks or LLM tooling - we value engineers who... 
    Permanent employment
    Flexible hours

    Teradata

    Austin, TX
    3 days ago
  •  ...a passionate Site Reliability Engineer to join our team in Dallas, TX...  ...You dive deep into the data to evaluate the health of your systems,...  ...grade AI/ML microservices (e.g., RAG pipelines, agentic systems) on...  .... Experience leveraging AI/LLM platforms (e.g., Gemini, Braintrust... 
    Local area

    Traveltechessentialist

    Austin, TX
    20 hours ago
  • $124k - $280k

     ...people in data and analytics engineering focus on leveraging advanced technologies...  ...and build production-grade RAG pipelines, MCP connections,...  ...sources for use in AI and LLM-powered solutions Manage daily...  ...closely with team members. We evaluate these factors thoughtfully to... 
    Full time
    H1b

    PwC

    Austin, TX
    2 days ago
  • Commerce.com Pty Ltd is seeking an AI Engineer in Austin to build and evolve AI capabilities that support a variety of business use cases. The role involves developing AI solutions including LLM-based systems, collaborating with product managers, and enhancing the quality... 

    Commerce.com Pty Ltd

    Austin, TX
    4 days ago
  • $150k - $185k

     ...offices or yours. Job Summary This Sr. Managed Services Engineer - AI & CoPilot is responsible for supporting, designing,...  ...Azure OpenAI and applied AI solutions (prompting, embeddings, RAG patterns, evaluation). Build and maintain Copilot Studio solutions,... 
    Work experience placement
    Work at office
    Remote work
    Worldwide
    Home office
    Flexible hours

    SHI GmbH

    Austin, TX
    4 days ago
  •  ...Senior Security Engineer (SOAR/Automation) H-E-B is a leading innovator in technology,...  ...and accelerate incident response. AI/LLM-Driven Automation Hands-on experience...  ..., threats, and solutions; helps evaluate technologies that align with business goals... 
    Work experience placement

    H-E-B

    Austin, TX
    20 hours ago
  • $75 - $80 per hour

     ...Job Title: System Firmware Test Engineer Duration: 12 Months Location: Austin, TX Work Type: Hybrid Pay Rate: $ 75.0...  ...deploying AI applications, specifically autonomous agents, LLM-backed tools, or RAG architectures applied to engineering or testing pipelines.... 
    Hourly pay

    TekWissen ®

    Austin, TX
    1 day ago
  •  ...QA Automation Engineer, Apple Store Online Imagine what you could do here! The people here...  ...~ Deep knowledge of Agentic AI and LLM driven test frameworks ~ Experience Testing...  ...Experience developing LLM based automated evaluation frameworks ~3+ years of proven ability... 

    Apple

    Austin, TX
    20 hours ago
  •  ...electronics goods. Job Title: Systems Engineer 3 Duration: 12 Months Location:...  ...benchmark itself, SoC and system. Evaluate system performance by executing industry...  ...benchmarking AI products and using various LLM benchmarks. Competitive analysis as... 
    Temporary work
    3 days per week

    Tekwissen

    Austin, TX
    3 days ago
  •  ...Job Title: GenAI Application Engineer (Python LLM Prompt Engineering) Location: Austin,...  ...foundation in LLM prompt engineering, RAG pipelines, and vector database integration...  ...or disability. All applicants will be evaluated solely on the basis of their ability,... 
    3 days per week
    1 day per week

    Yantran LLC

    Austin, TX
    20 hours ago
  •  ...Verification team is seeking an AI/ML Engineer to work at the intersection of...  ...- Developing and fine-tuning LLM-based systems tailored to...  ...-augmented generation (RAG) pipelines, agentic tool-use frameworks...  ...counterexample analysis. - Evaluating and integrating emerging AI/ML... 

    Apple Inc.

    Austin, TX
    2 days ago
  • QA & Automation Engineer with AI tools experience Location: Austin, TX Duration: 12Months+...  ...Required Skills AI-aligned test suites LLM system ML model testing API/E2E testing...  ...). Familiarity with agent reliability evaluation frameworks. Preferred Experience with automated... 
    Hourly pay
    Permanent employment
    Contract work
    Local area

    Robotics Prcocess Automation, LLC

    Austin, TX
    2 days ago
  •  ...QA Engineer We are redefining quality engineering by integrating AI-powered testing and...  ...testing and continuous quality improvements Evaluate and implement AI/ML-based testing...  ...based testing frameworks ~ Knowledge of LLM-based testing use cases (prompt-based test... 
    Shift work

    Georgia IT Inc

    Austin, TX
    3 days ago
  • $161k - $242k

     ...Date posted 03/18/2026 Category Engineering Hire Type Employee Job ID 16493...  ...scripting. You are energized by emerging AI/LLM technologies and have a proven track record...  ..., and response actions. Continuously evaluating and integrating emerging AI, detection,... 
    Remote work

    Synopsys

    Austin, TX
    a month ago
  • Overview Yo, are you a cracked engineer looking to join a team of people that are highly delusional and passionate about...  ...AI applications, utilizing prompt-engineering, RAG, and writing automated LLM evaluations Strong desire to be an early-hire at a fast-growing startup... 
    Work at office
    Relocation

    WorksHub

    Austin, TX
    2 days ago
  • $214k - $245k

     ...Operations, you will work at the intersection of engineering, AI, and business operations, building...  ...systems, including structuring prompts, evaluating outputs, and iterating on prompt...  ...Familiarity with retrieval-augmented generation (RAG), vector databases, and embedding-based... 
    Full time
    Work at office
    Flexible hours

    Upside

    Austin, TX
    1 day ago
  • $130k - $150k

     ...Senior Full Stack Engineer, Support AI & Automation The Senior Full Stack Engineer, Support...  ...& Automation Strategy: Research, evaluate, and build AI and automation...  ...Demonstrated experience working with AI and LLM-based systems, including: Model integration... 
    Full time
    Work at office
    Flexible hours

    Upside Services

    Austin, TX
    1 day ago
  • $160k - $180k

     ...the Role: We are looking for a GTM Engineer where you won’t just be using our tools...  ...have a deep understanding of the LLM stack: prompting, retrieval (RAG), cognitive architectures, and...  ...Haves: Expertise with LangSmith for evaluation and monitoring. Experience building... 
    Work at office
    Flexible hours

    LangChain

    Austin, TX
    1 day ago
  • $108.8k - $136k

     ...ROLE We’re seeking a Senior IAM Automation Engineer to transform how Apex manages workforce...  ..., and automation for managing AI agents, LLM API keys, service accounts, bot identities...  ...innovation in the identity space - evaluate emerging tools and practices, establish CI... 
    Work from home

    Apex Fintech Solutions UK Ltd.

    Austin, TX
    2 days ago
  •  ...hands-on mentorship from experienced senior engineers, and direct exposure to AI-powered...  ...Key Responsibilities • Assist in evaluating and interpreting system specifications to...  ...Experience or strong interest in using AI/LLM tools to streamline documentation and test... 
    Full time
    Temporary work
    Internship
    Work at office
    Local area
    Worldwide
    Flexible hours
    3 days per week

    Salient Systems

    Austin, TX
    1 day ago
  •  ...personalized. We're looking for a Senior QA Engineer who thrives in an AI-first engineering...  ...scenario coverage. Continuously evaluate and implement AI-powered QA platforms to...  ...Copilot, Testim, CodiumAI, Diffblue, or custom LLM-based tools. ~ Familiarity with writing... 
    Shift work

    EverlyWell

    Austin, TX
    1 day ago
  •  ...Database Engineer - RAG Platform Developer We're seeking a Database Engineer to architect and optimize our large-scale RAG (Retrieval-Augmented...  ...concurrency workloads Experience with embedding models and LLM integration patterns Demonstrated experience building or... 

    Apple

    Austin, TX
    3 days ago
  •  ...responsibilities We are seeking a Senior Software Engineer to own and evolve core platform systems...  ...ingestion, memory architecture, evaluation infrastructure, and gateway data management...  ...scale an evaluation curation system for an LLM-as-judge framework, including versioned... 

    IBM Computing

    Austin, TX
    4 days ago
  •  ...Chief Building Engineer Austin, TX The Chief Engineer leads the Engineering team and oversees all building maintenance, repair,...  ...budgets; control inventory and purchasing. Supervise, train, and evaluate Engineering staff; develop and implement performance goals and... 
    Contract work
    For contractors
    Work at office
    Local area
    Remote work
    Weekend work

    Lincoln Property Company

    Austin, TX
    1 day ago

Do you want to receive more vacancies?

Subscribe and receive similar vacancies to LLM / RAG Evaluation Engineer. Be the first to apply!