Sign up to access all features of our service.
  • Job search
  • Favorites
  • Create a CV
    New
  • Salaries
  • Subscriptions

AI Evaluations Engineer - Healthcare

$150k - $180k

Ellipsis Health

Location: Remote, located in the US Type: Full-time Department: Engineering Reports to: Director Of Engineering Responsibilities Build and maintain infrastructure and tooling for the AI evaluations platform used by internal teams, including automated testing platform for AI voice agents, debugging and observability tools. Develop and productionalize evaluation frameworks for individual system components such as ASR, LLMs, TTS, knowledge bases, and guardrails. Partner with ML, engineering and QA teams to translate evaluation requirements into robust, maintainable infrastructure and tooling. Improve developer experience by making evaluation systems easy to extend, well-documented, and reliable in day-to-day use. Ensure evaluation tooling meets production standards for reliability, performance, and maintainability. Qualifications 5+ years of professional software engineering experience, with a strong focus on building backend systems, platforms, or developer tooling. Proven experience designing and maintaining production-grade infrastructure with code, including APIs, services, and data pipelines. Experience using test automation frameworks, evaluation pipelines, or CI/CD-integrated testing systems. Familiarity with observability and debugging tools (logging, metrics, tracing) and building internal tools that improve developer and QA workflows. Strong debugging skills and a methodical approach to diagnosing production and evaluation issues. Ability to collaborate effectively across engineering, QA, and operations teams, translating requirements into reliable, maintainable systems. Product-minded approach to infrastructure, with attention to usability, documentation, and long-term maintainability. Preferred Experience working with complex, multi-component systems (e.g., ASR, LLMs, TTS, or other ML-powered services) Experience working in healthcare or other regulated environments, including awareness of HIPAA and PHI handling. Familiarity with conversational AI or voice agents, including multi-turn dialogue, latency constraints, and error recovery. Familiarity with LLM observability or evaluation tools (e.g., Langfuse, prompt eval frameworks). Background in digital health, care coordination, or patient-facing systems. As a health technology company, we reserve the right to run background checks on candidates to whom we extend offers, in compliance with applicable laws. We evaluate candidates holistically and comply with all “ban the box” regulations. Salary and Benefits We offer competitive salary and benefits, including 401(k) matching, health, vision, and dental insurance, and very flexible paid time off. The typical salary range for this role is $150,000 to $180,000 USD, depending on skills, qualifications, and relevant experience. Assistance If you have a disability or require accommodations during the application or recruitment process, please contact View email address on click.appcast.io . #J-18808-Ljbffr

Vacancy posted 4 days ago
Similar jobs that could be interesting for youBased on the AI Evaluations Engineer - Healthcare in San Francisco, CA vacancy
  • $150k - $180k

     ...A health technology company is seeking a skilled infrastructure engineer to build and maintain AI evaluation tooling. The ideal candidate has over 5 years of experience in software engineering with a focus on backend systems and production-grade infrastructure. This role... 
    Suggested
    Remote work
    Flexible hours

    Ellipsis Health

    San Francisco, CA
    4 days ago
  • $150k - $250k

     ...Max AI – Stripe for Healthcare Max AI is the World’s first human-free, fully-autonomous medical billing...  ...for over 10 years. And our Head of Engineering was one of the earliest engineers at...  ...Build, experiment, and evaluate AI agents and ML models in the NLP domain... 
    Suggested

    Maxcare

    San Francisco, CA
    2 days ago
  • $150k - $350k

     ...About Collate Collate is an AI document generation platform...  ...Lever. Our AI researchers, engineers, and designers have worked at...  ...Y Combinator) and leaders in healthcare and AI. This is a rare chance...  ...define the standards for how we evaluate, and deploy models that... 
    Suggested

    Collate

    San Francisco, CA
    2 days ago
  • $130k - $220k

     ...The opportunity summary by the Joinrs AI : The selection process will be fully...  ...benchmarking and insights company. They help engineers, enterprises, investors, media, and...  ...Is** This role is best described as an AI Evaluation Engineer / Technical Generalist. It is not... 
    Suggested
    Full time
    Worldwide

    Aurora Jobs ApS

    San Francisco, CA
    8 hours ago
  • $150k - $250k

     ...Distyl AI Job Posting Distyl is an applied AI technology...  ...largest companies in telecom, healthcare, insurance, manufacturing, consumer...  ..., we build AI systems using Evaluation-Driven Development —an...  ...production. AI Evaluation Engineers focus on designing and implementing... 
    Suggested
    Work at office
    3 days per week

    Distyl AI

    San Francisco, CA
    1 day ago
  • $150k - $250k

     ...A cutting-edge healthcare tech company is seeking an AI Engineer to build and evaluate AI agents, particularly in medical billing. The role requires a strong background in ML, NLP, and Python packages such as TensorFlow and PyTorch, with at least 6 years of industry experience... 

    MaxCare

    San Francisco, CA
    4 days ago
  • $170k - $200k

     ...Team is building a new kind of healthcare system across Medicaid,...  ...public benefit corporation and AI-enabled medical group, we partner...  ...The Role As a Senior AI Engineer, you will help design and build...  ...implementation through observability, evaluation, and continuous improvement... 
    Temporary work
    Local area
    Work from home
    Flexible hours

    Pair Team

    San Francisco, CA
    4 days ago
  • $170k - $210k

     ...Senior Software Engineer, AI Engineer Hybrid - SF Bay Area About Midi Health Midi...  ...has historically been underserved by the healthcare system. We're a fast-growing, mission-...  ...realities of healthcare-grade safety and evaluation. What You Will Do Design,... 
    Work at office
    Immediate start
    Remote work
    Shift work
    2 days per week

    Midi Health

    San Francisco, CA
    1 day ago
  •  ...A pioneering AI technology firm based in San Francisco is seeking an AI Engineer to own the evaluation infrastructure for AI agents. This role requires designing automated pipelines and building observability systems, ensuring agent performance meets enterprise standards... 
    Remote work
    Flexible hours

    Fieldguide.ai

    San Francisco, CA
    3 days ago
  •  ...Ironclad, located in San Francisco, is seeking an AI Evaluation Engineer to join their team. This role involves analyzing datasets, designing feedback loops, and partnering closely with AI Engineers to improve model quality. Applicants should have 8+ years of experience... 
    Contract work

    Ironclad Inc

    San Francisco, CA
    4 days ago
  • $175k - $250k

     ...AI Engineer (Hybrid - San Francisco, CA) We are currently supporting a new client based...  ...is building next generation AI powered healthcare workflow solutions. They are looking for...  ..., and tool usage. Build evaluation frameworks and feedback loops for model... 
    Full time

    OMG Technologies Inc

    San Francisco, CA
    12 hours ago
  •  ...AI Engineer Role at Roger Roger is an AI platform that frees home health clinicians from...  ...of their homes. Backed by leading healthcare investors like SignalFire, we've powered...  ...~ Experience training, fine-tuning, or evaluating LLMs and open source models, with real... 
    Remote work
    Work from home
    Flexible hours

    Roger Healthcare

    San Francisco, CA
    1 day ago
  • $180k - $215k

     ...technology. The flagship product—an AI-driven, non-invasive cardiac...  ...for exceptional strides in healthcare innovation, is supported by...  ...and greenfield product engineering. You won't just be consuming...  ...Implement advanced guardrails, evaluation frameworks, and reasoning validation... 
    Local area
    Worldwide
    Relocation

    HeartFlow

    San Francisco, CA
    1 day ago
  • $166.5k - $266.2k

     ...around the world. We are a global healthcare leader headquartered in...  ...something unprecedented — an AI foundation that will push the...  ...repeatable system standards and evaluation practices that scale across...  ...areas. The Forward Deployed AI Engineer is the connective tissue... 
    Full time
    Flexible hours

    Eli Lilly

    San Francisco, CA
    3 days ago
  • $181.5k - $283.8k

     ...Generative AI Engineer At Lilly, we unite caring with discovery to make life better for...  ...people around the world. We are a global healthcare leader headquartered in Indianapolis,...  ...deployments into repeatable system standards and evaluation practices that scale across therapeutic... 
    Full time
    Flexible hours

    Eli Lilly

    San Francisco, CA
    3 days ago
  •  ...About Luminai Healthcare operations have always depended on people...  ...By delegating to autonomous AI systems those mission-...  ...the role As a Software Engineer working on AI systems, you will...  ...benchmarks Design and improve evaluation frameworks to accelerate the... 
    Work at office
    Worldwide
    3 days per week

    Luminai, Inc

    San Francisco, CA
    12 hours ago
  •  ...Care is building the leading AI-native platform for family-led...  ...transparency across the healthcare system. Abby Care combines...  ...looking for a Senior Applied AI Engineer to build production-grade AI...  ...internal platforms. Design evaluation frameworks, datasets, metrics... 
    Full time

    Abby Care

    San Francisco, CA
    3 days ago
  • A cutting-edge AI firm in San Francisco is seeking a Research Engineer to develop evaluation systems and benchmarking pipelines for language models. Candidates should have a strong background in applied research, coding skills, and familiarity with ML models. You will work... 

    Mercor

    San Francisco, CA
    4 days ago
  •  ...© 2025 Repovive, Inc. All rights reserved. Back to Jobs Apply Now Compensation Not listed Posted April 25, 2026 Required Skills AI evaluation data pipelines agent instrumentation Requirements Mid/Senior Visa Sponsorship Not mentioned Relocation Not mentioned About the Role... 
    Relocation
    Visa sponsorship

    Repovive, Inc.

    San Francisco, CA
    4 days ago
  • $240k - $280k

    A leading software monitoring company is seeking a Senior Software Engineer on its AI/ML team to build evaluation infrastructure for measuring the performance of AI systems. This role involves designing datasets, creating benchmarks, and ensuring AI features behave reliably... 

    Sentry

    San Francisco, CA
    1 day ago
  •  ...Help Build the Future of Healthcare AI Infinitus is AI communications in service of patients...  ...We are looking for a Backend Software Engineer to help us design, build, and scale the...  ...human-centered product experiences. Evaluate and integrate AI technologies,... 
    Temporary work
    Work at office

    Infinitus LLC

    San Francisco, CA
    1 day ago
  •  ...beneficiaries with a dedicated healthcare advocate who navigates...  ...support of caring nurses while AI agents handle the tedious backend...  ...We're looking for an AI engineer to own the loop that turns real...  ...learning pipeline. Set up the evaluation infrastructure that measures... 
    Full time

    Mira Mace

    San Francisco, CA
    2 days ago
  •  ...medical records (EMR) and payments development platform for healthcare. We build modern, elegant front- and back-end tooling...  ...Hers Health). The Role We’re hiring an Applied AI Software Engineer to lead evaluations for agents in development and the post-deployment... 
    Remote work
    Home office
    Flexible hours

    Canvas Medical

    San Francisco, CA
    4 days ago
  •  ...located in San Francisco is seeking an innovative Quality Engineer for their AI products. This role blends ops, strategy, and analytics to...  ...leading labs, and ensure user satisfaction through effective evaluation baselines. Competitive salary and benefits offered, with a... 

    Notion

    San Francisco, CA
    1 day ago
  • B Capital seeks a talented individual for an AI Evaluation role in San Francisco. This position involves conducting critical comparative analysis, refining evaluation systems, and collaborating with various teams to enhance model capabilities. The ideal candidate will have... 

    B Capital

    San Francisco, CA
    1 day ago
  • A fast-growing AI company seeks a Software Engineer to focus on Model Evaluation & Benchmarking. This role involves building evaluation systems for multimodal AI, ensuring reliable performance. The ideal candidate will possess strong Python programming skills, familiarity... 

    SpreeAI

    San Francisco, CA
    4 days ago
  •  ...A technology firm in San Francisco is seeking a Research Engineer to enhance AI model quality. The ideal candidate will build benchmarks, datasets, and evaluation loops to ensure effective performance on critical tasks. This role requires strong programming skills and... 

    Variance

    San Francisco, CA
    4 days ago
  •  ...Nerdleveltech is seeking a Software Engineering evaluator based in San Francisco, California. In this contractor role, you will create datasets for training and evaluating AI models by curating code examples and refining AI-generated solutions across various programming... 
    For contractors
    10 hours per week
    Flexible hours

    Nerdleveltech

    San Francisco, CA
    3 days ago
  • $192k - $237.1k

    A leading compliance software company in San Francisco is seeking an Applied AI Engineer to innovate compliance automation through applied research and evaluation. This role emphasizes experimentation over production engineering, requiring strong skills in information retrieval... 

    Drata

    San Francisco, CA
    12 hours ago
  • Cacheflow is seeking a Senior Applied Research Engineer to enhance AI systems through rigorous experimentation and applied research. This research...  ...The individual will design information access strategies, evaluate innovative methodologies, and collaborate closely with... 
    Flexible hours

    Cacheflow

    San Francisco, CA
    1 day ago

Do you want to receive more vacancies?

Subscribe and receive similar vacancies to AI Evaluations Engineer - Healthcare. Be the first to apply!