Sign up to access all features of our service.
  • Job search
  • Favorites
  • Create a CV
    New
  • Salaries
  • Subscriptions

AI Evaluation Engineer

$180k

DeepRec.ai

US Recruitment Consultant: Guiding GenAI professionals towards their dream careers AI Evaluation Engineer $180,000 Remote (US-based) Are you passionate about shaping how AI is deployed safely, reliably, and at scale? This is a rare opportunity to join a mission‑driven tech company as their first AI Evaluation Engineer, a foundational role where you’ll design, build, and own the evaluation systems that safeguard every AI‑powered feature before it reaches the real world. This organization builds AI‑enabled products that directly helps governments, nonprofits, and agencies deliver financial support to people who need it most. As AI capabilities race forward, ensuring these systems are safe, accurate, and resilient is critical. That’s where you come in. You won’t just be testing models, you’ll be creating the frameworks, pipelines, and guardrails that make advanced LLM features safe to ship. You’ll collaborate with engineers, PMs, and AI safety experts to stress test boundaries, uncover weaknesses, and design scalable evaluation systems that protect end users while enabling rapid innovation. What You’ll Do Own the evaluation stack – design frameworks that define “good,” “risky,” and “catastrophic” outputs. Automate at scale – build data pipelines, LLM judges, and integrate with CI to block unsafe releases. Stress testing – red team AI systems with challenge prompts to expose brittleness, bias, or jailbreaks. Track and monitor – establish model/prompt versioning, build observability, and create incident response playbooks. Empower others – deliver tooling, APIs, and dashboards that put eval into every engineer’s workflow. Requirements Strong software engineering background (TypeScript a plus) Deep experience with OpenAI API or similar LLM ecosystems Practical knowledge of prompting, function calling, and eval techniques (e.g. LLM grading, moderation APIs) Familiarity with statistical analysis and validating data quality/performance Bonus: experience with observability, monitoring, or data science tooling Seniority level Not Applicable Employment type Full-time Job function Information Technology Technology, Information and Media, Information Services, and Software Development #J-18808-Ljbffr DeepRec.ai

Vacancy posted 10 hours ago
Similar jobs that could be interesting for youBased on the AI Evaluation Engineer in Denver, CO vacancy
  • A cybersecurity-focused company is seeking experienced professionals to evaluate AI-generated security content and solve real-world cybersecurity problems. This flexible role allows you to work remotely from various locations, engaging in projects that enhance AI systems... 
    Suggested
    Remote work
    Flexible hours

    DataAnnotation

    Denver, CO
    2 days ago
  • $116k - $175k

     ...continuous professional development. The Prompt + Skills Engineer is the hands-on builder in Cherry Bekaert's AI Center of Excellence - the person who writes the...  ...gets done. Participates in use case intake, evaluating submitted ideas from across the Firm for technical... 
    Suggested
    Work experience placement
    Local area

    Cherry Bekaert

    Denver, CO
    2 days ago
  • Alignerr is seeking a Python Infrastructure Engineer to shape advanced AI models through design and implementation of data pipelines. This fully...  ...systems programming skills, and a background in ML model evaluation. Join us for meaningful projects and potentially ongoing... 
    Suggested
    Remote job
    Contract work

    Alignerr

    Denver, CO
    1 day ago
  •  ...helping them shape their hybrid cloud and AI journeys. With support from our strategic...  ...As an AI Forward Deployed Engineer, you will work with customers to understand...  ...practices throughout deployment and adoption. Evaluate Model Performance: Assess the effectiveness... 
    Suggested
    Worldwide

    IBM Computing

    Denver, CO
    20 hours ago
  • A leading AI research accelerator is looking for a contractor to evaluate AI-generated code and enhance AI-driven coding solutions. The ideal candidate will have over 5 years of software engineering experience, including time at a top-tier company, and possess strong skills... 
    Suggested
    Contract work
    For contractors
    Remote work
    10 hours per week
    Flexible hours

    Turing

    Denver, CO
    4 days ago
  • $150k - $184k

     ...per week. The Opportunity The Generative AI Innovation Team is transforming how Litera...  ...competitive advantage. As a Senior AI Engineer, you will play a critical role in shaping...  ...technical direction across engineering. Evaluate, prototype, and document emerging tools,... 
    Work experience placement
    3 days per week

    FREEDOM SOLUTIONS GR

    Denver, CO
    3 days ago
  • $110k - $140k

     ...Citizenry, The Inside, and St. Frank. We are looking for an engineer who views AI as their primary superpower and who cares more about the...  ...experiences across our ecosystem, including defining AI behavior, evaluating output quality, and iterating based on real customer impact... 
    Full time
    Work at office
    2 days per week
    3 days per week

    Havenly

    Denver, CO
    3 days ago
  • $128.75k - $160.68k

     ...Description Connect for Health has a great opportunity for an AI Developer/Engineer. Connect for Health Colorado is a public, non-profit...  ...AI-related testing tools and frameworks (prompt/model evaluation, regression testing) to enhance testing efficiency and accuracy... 
    Full time
    Temporary work
    Work at office
    Remote work
    Weekend work
    Afternoon shift

    Connect for Health Colorado

    Denver, CO
    20 hours ago
  • $55 per hour

    Freelance AI Trainer - Civil Engineering & Python 1 day ago Be among the first 25 applicants This opportunity is only for candidates currently residing...  ...Civil Engineers with Python skills to train and evaluate AI models on realistic civil engineering problems. This role... 
    Part time
    Freelance
    Remote work
    Flexible hours

    Mindrift

    Denver, CO
    3 days ago
  • $120k - $250k

    Founded in 2016 in Silicon Valley, Pony.ai has quickly become a global leader in autonomous...  ..., curation, and multi-dimensional evaluation. Design and implement high-performance...  .... Build and optimize downstream engineering workflows for Large Language Models (LLMs... 
    Temporary work

    pony.ai

    Denver, CO
    4 days ago
  • $121.8k - $152.3k

     ...wherever we go next. You create your future and ours. AI Data Engineer JOB SUMMARY Working under general direction, designs...  ...outcomes.The role emphasizes data semantics, system context, evaluation, and collaboration across engineering, product, and governance... 
    Work at office

    Terumo BCT

    Lakewood, CO
    20 hours ago
  • A leading AI research accelerator is seeking a Software Engineer with over 5 years of experience. The role involves evaluating AI-generated code and working with various teams to enhance coding solutions. Candidates must have strong full-stack development skills and excellent... 
    Contract work
    For contractors
    Remote work
    10 hours per week
    Flexible hours

    Turing

    Denver, CO
    4 days ago
  • $90k - $105k

     ...Senior Life Sciences Knowledge Engineer Company: Norstella Location: Remote, United...  ...and critical global life sciences data and AI solutions provider dedicated to improving...  ...unites market-leading brands - Citeline, Evaluate, MMIT, Panalgo, Skipta and The Dedham Group... 
    Full time
    Temporary work
    Work at office
    Local area
    Remote work
    Flexible hours

    Norstella

    Denver, CO
    22 hours ago
  • $265k - $285k

     ...Principal AI Engineer DriveWealth is on a mission to make investing easier. We believe that everyone should have the ability to control...  ...stores, vector databases, model input/output pipelines and evaluation datasets Partner with data engineers, analysts and product... 
    Full time
    Work at office
    Worldwide

    DriveWealth

    Denver, CO
    3 days ago
  •  ...AI Platform Engineer We require people to be on-site, four days per week at our Denver or NYC office and are unable to offer relocation...  ...end to end stack; LLM serving and inference, RAG pipelines, evaluation harnesses and the APIs and infrastructure that put agents in... 
    Temporary work
    Work at office
    Local area
    Relocation package

    LG Ad Solutions

    Denver, CO
    2 days ago
  • $91k - $321.5k

     ...At PwC, our people in data and analytics engineering focus on leveraging advanced technologies...  ...will lead the development of innovative AI solutions that drive operational...  ...collaborating closely with team members. We evaluate these factors thoughtfully to establish a... 
    Full time
    H1b

    PwC

    Denver, CO
    4 days ago
  • $96.25k - $137.5k

     ...development, technology innovation or solution engineering, our team members play a vital role in...  ...(Objectives) Implement complex AI workflows using frameworks like...  ...infrastructure. Establish rigorous evaluation frameworks to measure model performance... 
    Local area
    Flexible hours

    EchoStar

    Littleton, CO
    10 days ago
  • $80 per hour

     ...Job Description Job Description AI & Machine Learning Engineer - AI Training About Prolific Prolific is not just another player in the AI...  ...Engineers to join our Expert Network to help train and evaluate the next generation of LLMs using deep technical expertise... 
    Hourly pay
    Work from home
    Flexible hours

    Prolific Academic Ltd

    Denver, CO
    20 days ago
  • $40 per hour

    A cybersecurity innovation firm is seeking experienced professionals to evaluate AI-generated content and solve technical problems in a remote environment. Candidates should have 2+ years of hands-on experience in cybersecurity, including areas like penetration testing... 
    Remote job
    Hourly pay
    Full time
    Part time
    Flexible hours

    DataAnnotation

    Denver, CO
    2 days ago
  • A leading AI research accelerator is seeking a skilled software engineer to evaluate AI-generated code and improve its efficiency and reliability. The role involves collaboration with cross-functional teams to enhance coding solutions, requiring a minimum of 5 years of... 
    Remote job
    Contract work
    For contractors
    10 hours per week
    Flexible hours

    Turing

    Denver, CO
    2 days ago
  • $60 per hour

     ...company is seeking proficient programmers to contribute to innovative AI systems with remote work flexibility. As part of the coding team, you will solve coding challenges, write quality code, and evaluate AI systems. Ideal candidates should be fluent in English,... 
    Remote job
    Flexible hours

    DataAnnotation

    Denver, CO
    3 days ago
  • A tech company specializing in AI is seeking proficient programmers to join their remote team. As part of a broader coding community...  ...tasks, focusing on designing AI training problems and evaluating AI-generated code. Ideal candidates should have strong English fluency... 
    Remote job
    Flexible hours

    DataAnnotation

    Denver, CO
    4 days ago
  • A leading AI research accelerator is looking for a contractor with over 3 years of software engineering experience to evaluate and refine AI-generated code. You'll collaborate with teams to enhance AI-driven coding solutions and will need a strong understanding of software... 
    Remote job
    For contractors
    10 hours per week
    Flexible hours

    Turing

    Denver, CO
    2 days ago
  •  ...The role We're hiring two senior full-stack engineers to build the next generation of Ombud's agentic AI platform. The work splits across two domains: building...  .../ agent orchestration / tool calling / evaluation pipelines) and the ML data engineering that supports... 
    Work at office

    Ombud

    Denver, CO
    2 days ago
  • $108.5k - $163.2k

     ...the Role We are building enterprise‑grade AI capabilities to operate kidney care services at scale. The Lead AI Software Engineer sits within Strive’s AI Center of...  ...emerging AI capabilities and thoughtfully evaluates when and how to apply them in a production... 
    Work at office
    Remote work
    Work from home
    Flexible hours

    Strive Health

    Denver, CO
    3 days ago
  • $155k - $235k

     ...Catalog in Databricks, to the semantic and AI layers that sit on top. This high‑impact...  ...standards, and ensuring data works for engineers, analysts, and business users alike. About...  ...how modern LLMs are trained, aligned and evaluated (RLHF, fine‑tuning, prompt engineering,... 
    Home office
    Flexible hours

    Scribd, Inc.

    Denver, CO
    4 days ago
  • $106k - $159k

     ...Role description Senior AI/ML Engineer Lead II - ML Engineering Who We Are: Born digital, UST transforms lives...  ...(data ingestion, training, validation, deployment) * Evaluate and improve model performance through experimentation and tuning... 
    Full time
    Temporary work
    Part time
    Work at office
    Local area
    Flexible hours

    UST Inc

    Denver, CO
    1 day ago
  •  ...Senior AI/ML Engineer Anywhere Type: Contract-to-Hire Category: Development Industry: Government Workplace Type: Remote...  ...match the requirements of the position. All AI-assisted evaluations and responses are reviewed by human recruiters before any hiring... 
    Hourly pay
    Permanent employment
    Contract work
    Local area
    Remote work

    Eliassen Group

    Denver, CO
    3 days ago
  • $200k - $300k

     ...with the core digital systems, specialized AI, and data-driven foundation to eliminate...  ...is high. We're looking for an AI Engineering Leader who leads from the front. You'll...  ...on leader with expertise in designing, evaluating, and shipping AI based features. This... 
    Contract work
    Work at office

    Vertafore

    Denver, CO
    12 days ago
  • $40 per hour

    A leading AI cybersecurity firm is looking for experienced cybersecurity professionals to join their team. In this remote role, you will evaluate AI-generated security content, solve technical problems, and provide feedback to enhance AI reasoning about real-world cybersecurity... 
    Remote job
    Hourly pay
    Flexible hours

    DataAnnotation

    Denver, CO
    2 days ago

Do you want to receive more vacancies?

Subscribe and receive similar vacancies to AI Evaluation Engineer. Be the first to apply!