AI Evaluations Engineer - Healthcare
$150k - $180kEllipsis Health
Location: Remote, located in the US Type: Full-time Department: Engineering Reports to: Director Of Engineering Responsibilities Build and maintain infrastructure and tooling for the AI evaluations platform used by internal teams, including automated testing platform for AI voice agents, debugging and observability tools. Develop and productionalize evaluation frameworks for individual system components such as ASR, LLMs, TTS, knowledge bases, and guardrails. Partner with ML, engineering and QA teams to translate evaluation requirements into robust, maintainable infrastructure and tooling. Improve developer experience by making evaluation systems easy to extend, well-documented, and reliable in day-to-day use. Ensure evaluation tooling meets production standards for reliability, performance, and maintainability. Qualifications 5+ years of professional software engineering experience, with a strong focus on building backend systems, platforms, or developer tooling. Proven experience designing and maintaining production-grade infrastructure with code, including APIs, services, and data pipelines. Experience using test automation frameworks, evaluation pipelines, or CI/CD-integrated testing systems. Familiarity with observability and debugging tools (logging, metrics, tracing) and building internal tools that improve developer and QA workflows. Strong debugging skills and a methodical approach to diagnosing production and evaluation issues. Ability to collaborate effectively across engineering, QA, and operations teams, translating requirements into reliable, maintainable systems. Product-minded approach to infrastructure, with attention to usability, documentation, and long-term maintainability. Preferred Experience working with complex, multi-component systems (e.g., ASR, LLMs, TTS, or other ML-powered services) Experience working in healthcare or other regulated environments, including awareness of HIPAA and PHI handling. Familiarity with conversational AI or voice agents, including multi-turn dialogue, latency constraints, and error recovery. Familiarity with LLM observability or evaluation tools (e.g., Langfuse, prompt eval frameworks). Background in digital health, care coordination, or patient-facing systems. As a health technology company, we reserve the right to run background checks on candidates to whom we extend offers, in compliance with applicable laws. We evaluate candidates holistically and comply with all “ban the box” regulations. Salary and Benefits We offer competitive salary and benefits, including 401(k) matching, health, vision, and dental insurance, and very flexible paid time off. The typical salary range for this role is $150,000 to $180,000 USD, depending on skills, qualifications, and relevant experience. Assistance If you have a disability or require accommodations during the application or recruitment process, please contact View email address on click.appcast.io . #J-18808-Ljbffr
$150k - $180k
...A health technology company is seeking a skilled infrastructure engineer to build and maintain AI evaluation tooling. The ideal candidate has over 5 years of experience in software engineering with a focus on backend systems and production-grade infrastructure. This role...SuggestedRemote workFlexible hours$150k - $250k
...Max AI – Stripe for Healthcare Max AI is the World’s first human-free, fully-autonomous medical billing... ...for over 10 years. And our Head of Engineering was one of the earliest engineers at... ...Build, experiment, and evaluate AI agents and ML models in the NLP domain...Suggested$150k - $350k
...About Collate Collate is an AI document generation platform... ...Lever. Our AI researchers, engineers, and designers have worked at... ...Y Combinator) and leaders in healthcare and AI. This is a rare chance... ...define the standards for how we evaluate, and deploy models that...Suggested$130k - $220k
...The opportunity summary by the Joinrs AI : The selection process will be fully... ...benchmarking and insights company. They help engineers, enterprises, investors, media, and... ...Is** This role is best described as an AI Evaluation Engineer / Technical Generalist. It is not...SuggestedFull timeWorldwide$150k - $250k
...Distyl AI Job Posting Distyl is an applied AI technology... ...largest companies in telecom, healthcare, insurance, manufacturing, consumer... ..., we build AI systems using Evaluation-Driven Development —an... ...production. AI Evaluation Engineers focus on designing and implementing...SuggestedWork at office3 days per week$150k - $250k
...A cutting-edge healthcare tech company is seeking an AI Engineer to build and evaluate AI agents, particularly in medical billing. The role requires a strong background in ML, NLP, and Python packages such as TensorFlow and PyTorch, with at least 6 years of industry experience...$170k - $200k
...Team is building a new kind of healthcare system across Medicaid,... ...public benefit corporation and AI-enabled medical group, we partner... ...The Role As a Senior AI Engineer, you will help design and build... ...implementation through observability, evaluation, and continuous improvement...Temporary workLocal areaWork from homeFlexible hours$170k - $210k
...Senior Software Engineer, AI Engineer Hybrid - SF Bay Area About Midi Health Midi... ...has historically been underserved by the healthcare system. We're a fast-growing, mission-... ...realities of healthcare-grade safety and evaluation. What You Will Do Design,...Work at officeImmediate startRemote workShift work2 days per week- ...A pioneering AI technology firm based in San Francisco is seeking an AI Engineer to own the evaluation infrastructure for AI agents. This role requires designing automated pipelines and building observability systems, ensuring agent performance meets enterprise standards...Remote workFlexible hours
- ...Ironclad, located in San Francisco, is seeking an AI Evaluation Engineer to join their team. This role involves analyzing datasets, designing feedback loops, and partnering closely with AI Engineers to improve model quality. Applicants should have 8+ years of experience...Contract work
$175k - $250k
...AI Engineer (Hybrid - San Francisco, CA) We are currently supporting a new client based... ...is building next generation AI powered healthcare workflow solutions. They are looking for... ..., and tool usage. Build evaluation frameworks and feedback loops for model...Full time- ...AI Engineer Role at Roger Roger is an AI platform that frees home health clinicians from... ...of their homes. Backed by leading healthcare investors like SignalFire, we've powered... ...~ Experience training, fine-tuning, or evaluating LLMs and open source models, with real...Remote workWork from homeFlexible hours
$180k - $215k
...technology. The flagship product—an AI-driven, non-invasive cardiac... ...for exceptional strides in healthcare innovation, is supported by... ...and greenfield product engineering. You won't just be consuming... ...Implement advanced guardrails, evaluation frameworks, and reasoning validation...Local areaWorldwideRelocation$166.5k - $266.2k
...around the world. We are a global healthcare leader headquartered in... ...something unprecedented — an AI foundation that will push the... ...repeatable system standards and evaluation practices that scale across... ...areas. The Forward Deployed AI Engineer is the connective tissue...Full timeFlexible hours$181.5k - $283.8k
...Generative AI Engineer At Lilly, we unite caring with discovery to make life better for... ...people around the world. We are a global healthcare leader headquartered in Indianapolis,... ...deployments into repeatable system standards and evaluation practices that scale across therapeutic...Full timeFlexible hours- ...About Luminai Healthcare operations have always depended on people... ...By delegating to autonomous AI systems those mission-... ...the role As a Software Engineer working on AI systems, you will... ...benchmarks Design and improve evaluation frameworks to accelerate the...Work at officeWorldwide3 days per week
- ...Care is building the leading AI-native platform for family-led... ...transparency across the healthcare system. Abby Care combines... ...looking for a Senior Applied AI Engineer to build production-grade AI... ...internal platforms. Design evaluation frameworks, datasets, metrics...Full time
- A cutting-edge AI firm in San Francisco is seeking a Research Engineer to develop evaluation systems and benchmarking pipelines for language models. Candidates should have a strong background in applied research, coding skills, and familiarity with ML models. You will work...
- ...© 2025 Repovive, Inc. All rights reserved. Back to Jobs Apply Now Compensation Not listed Posted April 25, 2026 Required Skills AI evaluation data pipelines agent instrumentation Requirements Mid/Senior Visa Sponsorship Not mentioned Relocation Not mentioned About the Role...RelocationVisa sponsorship
$240k - $280k
A leading software monitoring company is seeking a Senior Software Engineer on its AI/ML team to build evaluation infrastructure for measuring the performance of AI systems. This role involves designing datasets, creating benchmarks, and ensuring AI features behave reliably...- ...Help Build the Future of Healthcare AI Infinitus is AI communications in service of patients... ...We are looking for a Backend Software Engineer to help us design, build, and scale the... ...human-centered product experiences. Evaluate and integrate AI technologies,...Temporary workWork at office
- ...beneficiaries with a dedicated healthcare advocate who navigates... ...support of caring nurses while AI agents handle the tedious backend... ...We're looking for an AI engineer to own the loop that turns real... ...learning pipeline. Set up the evaluation infrastructure that measures...Full time
- ...medical records (EMR) and payments development platform for healthcare. We build modern, elegant front- and back-end tooling... ...Hers Health). The Role We’re hiring an Applied AI Software Engineer to lead evaluations for agents in development and the post-deployment...Remote workHome officeFlexible hours
- ...located in San Francisco is seeking an innovative Quality Engineer for their AI products. This role blends ops, strategy, and analytics to... ...leading labs, and ensure user satisfaction through effective evaluation baselines. Competitive salary and benefits offered, with a...
- B Capital seeks a talented individual for an AI Evaluation role in San Francisco. This position involves conducting critical comparative analysis, refining evaluation systems, and collaborating with various teams to enhance model capabilities. The ideal candidate will have...
- A fast-growing AI company seeks a Software Engineer to focus on Model Evaluation & Benchmarking. This role involves building evaluation systems for multimodal AI, ensuring reliable performance. The ideal candidate will possess strong Python programming skills, familiarity...
- ...A technology firm in San Francisco is seeking a Research Engineer to enhance AI model quality. The ideal candidate will build benchmarks, datasets, and evaluation loops to ensure effective performance on critical tasks. This role requires strong programming skills and...
- ...Nerdleveltech is seeking a Software Engineering evaluator based in San Francisco, California. In this contractor role, you will create datasets for training and evaluating AI models by curating code examples and refining AI-generated solutions across various programming...For contractors10 hours per weekFlexible hours
$192k - $237.1k
A leading compliance software company in San Francisco is seeking an Applied AI Engineer to innovate compliance automation through applied research and evaluation. This role emphasizes experimentation over production engineering, requiring strong skills in information retrieval...- Cacheflow is seeking a Senior Applied Research Engineer to enhance AI systems through rigorous experimentation and applied research. This research... ...The individual will design information access strategies, evaluate innovative methodologies, and collaborate closely with...Flexible hours
Do you want to receive more vacancies?
Subscribe and receive similar vacancies to AI Evaluations Engineer - Healthcare. Be the first to apply!
- ai engineer remote San Francisco, CA
- ai prompt engineer San Francisco, CA
- senior ai engineer San Francisco, CA
- machine learning ai engineer San Francisco, CA
- ai engineer San Francisco, CA
- ai developer San Francisco, CA
- ai ml engineer San Francisco, CA
- ai research engineer San Francisco, CA
- manager healthcare San Francisco, CA
- software engineer healthcare San Francisco, CA




