Sign up to access all features of our service.
  • Job search
  • Favorites
  • Create a CV
    New
  • Salaries
  • Subscriptions

AI Evaluation Engineer

$180k

DeepRec.ai

US Recruitment Consultant: Guiding GenAI professionals towards their dream careers AI Evaluation Engineer $180,000 Remote (US-based) Are you passionate about shaping how AI is deployed safely, reliably, and at scale? This is a rare opportunity to join a mission‑driven tech company as their first AI Evaluation Engineer, a foundational role where you’ll design, build, and own the evaluation systems that safeguard every AI‑powered feature before it reaches the real world. This organization builds AI‑enabled products that directly helps governments, nonprofits, and agencies deliver financial support to people who need it most. As AI capabilities race forward, ensuring these systems are safe, accurate, and resilient is critical. That’s where you come in. You won’t just be testing models, you’ll be creating the frameworks, pipelines, and guardrails that make advanced LLM features safe to ship. You’ll collaborate with engineers, PMs, and AI safety experts to stress test boundaries, uncover weaknesses, and design scalable evaluation systems that protect end users while enabling rapid innovation. What You’ll Do Own the evaluation stack – design frameworks that define “good,” “risky,” and “catastrophic” outputs. Automate at scale – build data pipelines, LLM judges, and integrate with CI to block unsafe releases. Stress testing – red team AI systems with challenge prompts to expose brittleness, bias, or jailbreaks. Track and monitor – establish model/prompt versioning, build observability, and create incident response playbooks. Empower others – deliver tooling, APIs, and dashboards that put eval into every engineer’s workflow. Requirements Strong software engineering background (TypeScript a plus) Deep experience with OpenAI API or similar LLM ecosystems Practical knowledge of prompting, function calling, and eval techniques (e.g. LLM grading, moderation APIs) Familiarity with statistical analysis and validating data quality/performance Bonus: experience with observability, monitoring, or data science tooling Seniority level Not Applicable Employment type Full-time Job function Information Technology Technology, Information and Media, Information Services, and Software Development #J-18808-Ljbffr DeepRec.ai

Vacancy posted 2 days ago
Similar jobs that could be interesting for youBased on the AI Evaluation Engineer in Denver, CO vacancy
  • A cybersecurity-focused company is seeking experienced professionals to evaluate AI-generated security content and solve real-world cybersecurity problems. This flexible role allows you to work remotely from various locations, engaging in projects that enhance AI systems... 
    Suggested
    Remote work
    Flexible hours

    DataAnnotation

    Denver, CO
    4 days ago
  •  ...a career in Advisory. KPMG is currently seeking a Manager, AI Engineer to join our Advisory Services practice. Responsibilities:...  ...with emerging AI/ML technologies, frameworks, and tools, and evaluate their applicability for client needs Act with integrity, professionalism... 
    Suggested
    Full time
    H1b
    Local area

    KPMG

    Denver, CO
    8 days ago
  • $126.8k - $158.5k

     ...shape wherever we go next. You create your future and ours. AI Engineer JOB SUMMARY We are seeking a motivated AI Engineer...  ...MLOps practices appropriate to the solution (versioning, evaluation, monitoring signals, and safe rollout patterns). Reviews and... 
    Suggested
    Internship
    Work at office

    Terumo BCT

    Littleton, CO
    2 days ago
  • $60 per hour

     ...A leading AI development firm is seeking proficient programmers to join their remote team. You'll tackle diverse coding challenges, create applications, and provide critical evaluations of AI-generated code. Successful candidates will be fluent in English and proficient... 
    Suggested
    Remote work
    Flexible hours

    DataAnnotation

    Denver, CO
    4 hours ago
  • $55 per hour

     ...Freelance AI Trainer - Civil Engineering & Python 1 day ago Be among the first 25 applicants This opportunity is only for candidates currently...  ...Civil Engineers with Python skills to train and evaluate AI models on realistic civil engineering problems. This role... 
    Suggested
    Part time
    Freelance
    Remote work
    Flexible hours

    Mind Rift

    Denver, CO
    1 day ago
  • $170k - $200k

    AT A GLANCE RVO Health is looking for an AI Engineer to help build the next generation of AI-powered tools to accelerate our internal...  ...sources Develop and maintain prompt engineering frameworks, evaluation pipelines, and feedback loops to continuously improve assistant... 
    Full time
    Temporary work
    Work at office
    Remote work
    Monday to Friday
    Flexible hours

    Dormont Manufacturing Co

    Denver, CO
    1 day ago
  • $120k - $250k

    Founded in 2016 in Silicon Valley, Pony.ai has quickly become a global leader in autonomous...  ..., curation, and multi-dimensional evaluation. Design and implement high-performance...  .... Build and optimize downstream engineering workflows for Large Language Models (LLMs... 
    Temporary work

    pony.ai

    Denver, CO
    1 day ago
  • $73.5k - $212.28k

     ...At PwC, our people in data and analytics engineering focus on leveraging advanced technologies...  ...will lead the development of innovative AI solutions that drive remarkable client...  ...collaborating closely with team members. We evaluate these factors thoughtfully to establish a... 
    Full time
    H1b

    PwC

    Denver, CO
    2 days ago
  • $50.5k - $140k

     ...Travel Requirements: Up to 20% The Opportunity As a CTIO - AI Engineer - Experienced Associate, you will leverage advanced analytics...  ...assets, or collaborating closely with team members. We evaluate these factors thoughtfully to establish a secure and trusted workplace... 
    Full time
    H1b

    PwC

    Denver, CO
    2 days ago
  • $155k - $235k

     ...Catalog in Databricks, to the semantic and AI layers that sit on top. This high‑impact...  ...standards, and ensuring data works for engineers, analysts, and business users alike. About...  ...how modern LLMs are trained, aligned and evaluated (RLHF, fine‑tuning, prompt engineering, retrieval... 
    Home office
    Flexible hours

    Scribd

    Denver, CO
    15 hours ago
  • $90k - $105k

     ...Senior Life Sciences Knowledge Engineer Company: Norstella Location: Remote, United...  ...and critical global life sciences data and AI solutions provider dedicated to improving...  ...unites market-leading brands - Citeline, Evaluate, MMIT, Panalgo, Skipta and The Dedham Group... 
    Full time
    Temporary work
    Work at office
    Local area
    Remote work
    Flexible hours

    Norstella

    Denver, CO
    3 days ago
  •  ...of inspiration and expand your capabilities, then consider a career in Advisory. KPMG is currently seeking a Senior Associate, AI Engineer to join our Advisory Services practice. Responsibilities: Develop GenAI / LLM applications and integrations using... 
    Full time
    H1b
    Local area

    KPMG

    Denver, CO
    8 days ago
  •  ...a career in Advisory. KPMG is currently seeking a Director, AI Strategy Architect to join our Advisory Services practice....  ...development, ensuring the team stays at the forefront of AI and cloud engineering technology advancements Work closely with our Microsoft,... 
    Full time
    H1b
    Local area

    KPMG

    Denver, CO
    8 days ago
  •  ...Principal AI Systems Engineer Swimlane is redefining security operations with Agentic AI automation that empowers organizations to work...  ...alert triage, summarization, and workflow automation. Own Evaluation (Evals): Create test sets, define success metrics (accuracy... 

    Swimlane

    Denver, CO
    3 days ago
  • $123k - $212k

     ...place. As a Principal Consultant, Artificial Intelligence (AI) Engineer, you will play a leading role in shaping how our clients...  ...Search, pgvector, Pinecone, Weaviate) and LLM lifecycle concerns (evaluation, versioning, monitoring, cost). ~ Proven experience... 
    Temporary work
    Local area
    Flexible hours

    Pioneer Management Consulting

    Denver, CO
    2 days ago
  • $245k - $272k

     ...more about our Total Rewards philosophy. AI is a fundamental part of how work gets...  ...will define how Gusto builds, deploys, evaluates, and scales AI/ML systems across the company...  ...organization spanning Machine Learning Engineering, ML Platform, Risk Data Science, and AI... 
    Full time
    Work at office
    Local area
    Remote work
    2 days per week
    3 days per week

    I did my part and supported the Regular Toilet

    Denver, CO
    1 day ago
  • A leading AI training firm is seeking an Audit Defense Specialist to improve AI models related to healthcare. This remote position involves evaluating the logic and accuracy of AI chatbot responses while ensuring medical correctness. Applicants should possess a healthcare... 
    Hourly pay
    Remote work
    Flexible hours

    DataAnnotation

    Denver, CO
    1 day ago
  • ChatGPT Jobs is seeking a Rust Developer for AI Training who will provide expert feedback on AI-generated code, ensuring accuracy and...  ...’s degree in a relevant field. You will create prompts and evaluate AI responses, optimizing the performance of AI models. Excellent... 
    Remote job
    For contractors
    Flexible hours

    ChatGPT Jobs

    Denver, CO
    5 days ago
  • $40 per hour

    A cybersecurity innovation firm is seeking experienced professionals to evaluate AI-generated content and solve technical problems in a remote environment. Candidates should have 2+ years of hands-on experience in cybersecurity, including areas like penetration testing... 
    Remote job
    Hourly pay
    Full time
    Part time
    Flexible hours

    DataAnnotation

    Denver, CO
    4 days ago
  • A leading AI research accelerator is looking for a contractor to evaluate AI-generated code and enhance AI-driven coding solutions. The ideal candidate will have over 5 years of software engineering experience, including time at a top-tier company, and possess strong skills... 
    Remote job
    Contract work
    For contractors
    10 hours per week
    Flexible hours

    Turing

    Denver, CO
    2 days ago
  • A leading AI research accelerator is seeking a skilled software engineer to evaluate AI-generated code and improve its efficiency and reliability. The role involves collaboration with cross-functional teams to enhance coding solutions, requiring a minimum of 5 years of... 
    Remote job
    Contract work
    For contractors
    10 hours per week
    Flexible hours

    Turing

    Denver, CO
    4 days ago
  • A tech company specializing in AI is seeking proficient programmers to join their remote team. As part of a broader coding community...  ...tasks, focusing on designing AI training problems and evaluating AI-generated code. Ideal candidates should have strong English fluency... 
    Remote job
    Flexible hours

    DataAnnotation

    Denver, CO
    1 day ago
  • $150k - $184k

     ...per week. The Opportunity The Generative AI Innovation Team is transforming how Litera...  ...competitive advantage. As a Senior AI Engineer, you will play a critical role in shaping...  ...technical direction across engineering. Evaluate, prototype, and document emerging tools,... 
    Work experience placement
    3 days per week

    FREEDOM SOLUTIONS GR

    Denver, CO
    4 days ago
  • $110k - $140k

     ...Citizenry, The Inside, and St. Frank. We are looking for an engineer who views AI as their primary superpower and who cares more about the...  ...experiences across our ecosystem, including defining AI behavior, evaluating output quality, and iterating based on real customer impact... 
    Full time
    Work at office
    2 days per week
    3 days per week

    Havenly Inc

    Denver, CO
    2 days ago
  • $40 per hour

    A leading AI cybersecurity firm is looking for experienced cybersecurity professionals to join their team. In this remote role, you will evaluate AI-generated security content, solve technical problems, and provide feedback to enhance AI reasoning about real-world cybersecurity... 
    Remote job
    Hourly pay
    Flexible hours

    DataAnnotation

    Denver, CO
    4 days ago
  • $40 per hour

    A cybersecurity company is looking for experienced professionals to train AI models by evaluating AI-generated security content and solving technical problems. Candidates should have 2+ years of cybersecurity experience and coding knowledge, with fluency in English. This... 
    Remote job
    Hourly pay
    Full time
    Part time
    Flexible hours

    DataAnnotation

    Denver, CO
    1 day ago
  • $40 per hour

    A cybersecurity-focused company is seeking experienced cybersecurity professionals to evaluate AI-generated security content and solve technical problems in a flexible remote role. Candidates should have at least 2 years of hands-on experience in cybersecurity and some... 
    Remote job
    Hourly pay
    Flexible hours

    DataAnnotation

    Denver, CO
    4 days ago
  •  ...investing heavily in data, cloud platforms, and AI to elevate analyst productivity,...  ...technical role for someone who combines strong engineering fundamentals with genuine curiosity...  ...Python, building cloud infrastructure, evaluating open‑source tools, and sitting with a... 

    Charles Schwab

    Littleton, CO
    5 days ago
  •  ...complete cloud analytics and data platform for AI. By delivering harmonized data, trusted...  ...You'll Do We are seeking Director of AI Engineering to lead teams building Agent Platform,...  ..., knowledge bases, vector stores, evaluation frameworks). Establish best practices for... 
    Permanent employment
    Flexible hours

    Teradata Corporation (SE)

    Denver, CO
    5 days ago
  • $40 per hour

    A tech-driven cybersecurity company is seeking experienced professionals to evaluate AI-generated security content and solve technical cybersecurity problems. This remote role allows flexible scheduling and project selection, paying $40+ per hour. Candidates should have... 
    Remote job
    Hourly pay
    Flexible hours

    DataAnnotation

    Denver, CO
    2 days ago

Do you want to receive more vacancies?

Subscribe and receive similar vacancies to AI Evaluation Engineer. Be the first to apply!