AI Evaluations Engineer - Healthcare

$150k - $180k

Ellipsis Health

Location: Remote, located in the US Type: Full-time Department: Engineering Reports to: Director Of Engineering Responsibilities Build and maintain infrastructure and tooling for the AI evaluations platform used by internal teams, including automated testing platform for AI voice agents, debugging and observability tools. Develop and productionalize evaluation frameworks for individual system components such as ASR, LLMs, TTS, knowledge bases, and guardrails. Partner with ML, engineering and QA teams to translate evaluation requirements into robust, maintainable infrastructure and tooling. Improve developer experience by making evaluation systems easy to extend, well-documented, and reliable in day-to-day use. Ensure evaluation tooling meets production standards for reliability, performance, and maintainability. Qualifications 5+ years of professional software engineering experience, with a strong focus on building backend systems, platforms, or developer tooling. Proven experience designing and maintaining production-grade infrastructure with code, including APIs, services, and data pipelines. Experience using test automation frameworks, evaluation pipelines, or CI/CD-integrated testing systems. Familiarity with observability and debugging tools (logging, metrics, tracing) and building internal tools that improve developer and QA workflows. Strong debugging skills and a methodical approach to diagnosing production and evaluation issues. Ability to collaborate effectively across engineering, QA, and operations teams, translating requirements into reliable, maintainable systems. Product-minded approach to infrastructure, with attention to usability, documentation, and long-term maintainability. Preferred Experience working with complex, multi-component systems (e.g., ASR, LLMs, TTS, or other ML-powered services) Experience working in healthcare or other regulated environments, including awareness of HIPAA and PHI handling. Familiarity with conversational AI or voice agents, including multi-turn dialogue, latency constraints, and error recovery. Familiarity with LLM observability or evaluation tools (e.g., Langfuse, prompt eval frameworks). Background in digital health, care coordination, or patient-facing systems. As a health technology company, we reserve the right to run background checks on candidates to whom we extend offers, in compliance with applicable laws. We evaluate candidates holistically and comply with all “ban the box” regulations. Salary and Benefits We offer competitive salary and benefits, including 401(k) matching, health, vision, and dental insurance, and very flexible paid time off. The typical salary range for this role is $150,000 to $180,000 USD, depending on skills, qualifications, and relevant experience. Assistance If you have a disability or require accommodations during the application or recruitment process, please contact View email address on click.appcast.io . #J-18808-Ljbffr

Apply

Vacancy posted 4 days ago

Similar jobs that could be interesting for youBased on the AI Evaluations Engineer - Healthcare in San Francisco, CA vacancy

Remote AI Evaluations Engineer for Healthcare Systems
$150k - $180k
...A health technology company is seeking a skilled infrastructure engineer to build and maintain AI evaluation tooling. The ideal candidate has over 5 years of experience in software engineering with a focus on backend systems and production-grade infrastructure. This role...
Suggested
Remote work
Flexible hours
Ellipsis Health
San Francisco, CA
4 days ago
AI Engineer
$150k - $250k
...Max AI – Stripe for Healthcare Max AI is the World’s first human-free, fully-autonomous medical billing... ...for over 10 years. And our Head of Engineering was one of the earliest engineers at... ...Build, experiment, and evaluate AI agents and ML models in the NLP domain...
Suggested
Maxcare
San Francisco, CA
2 days ago
AI Engineer
$150k - $350k
...About Collate Collate is an AI document generation platform... ...Lever. Our AI researchers, engineers, and designers have worked at... ...Y Combinator) and leaders in healthcare and AI. This is a rare chance... ...define the standards for how we evaluate, and deploy models that...
Suggested
Collate
San Francisco, CA
2 days ago
AI Evaluation Engineer
$130k - $220k
...The opportunity summary by the Joinrs AI : The selection process will be fully... ...benchmarking and insights company. They help engineers, enterprises, investors, media, and... ...Is** This role is best described as an AI Evaluation Engineer / Technical Generalist. It is not...
Suggested
Full time
Worldwide
Aurora Jobs ApS
San Francisco, CA
8 hours ago
AI Engineer, Evaluation
$150k - $250k
...Distyl AI Job Posting Distyl is an applied AI technology... ...largest companies in telecom, healthcare, insurance, manufacturing, consumer... ..., we build AI systems using Evaluation-Driven Development —an... ...production. AI Evaluation Engineers focus on designing and implementing...
Suggested
Work at office
3 days per week
Distyl AI
San Francisco, CA
1 day ago
AI Engineer: Medical Billing NLP for Healthcare
$150k - $250k
...A cutting-edge healthcare tech company is seeking an AI Engineer to build and evaluate AI agents, particularly in medical billing. The role requires a strong background in ML, NLP, and Python packages such as TensorFlow and PyTorch, with at least 6 years of industry experience...
MaxCare
San Francisco, CA
4 days ago
Senior AI Engineer
$170k - $200k
...Team is building a new kind of healthcare system across Medicaid,... ...public benefit corporation and AI-enabled medical group, we partner... ...The Role As a Senior AI Engineer, you will help design and build... ...implementation through observability, evaluation, and continuous improvement...
Temporary work
Local area
Work from home
Flexible hours
Pair Team
San Francisco, CA
4 days ago
Senior Software Engineer, AI Engineer
$170k - $210k
...Senior Software Engineer, AI Engineer Hybrid - SF Bay Area About Midi Health Midi... ...has historically been underserved by the healthcare system. We're a fast-growing, mission-... ...realities of healthcare-grade safety and evaluation. What You Will Do Design,...
Work at office
Immediate start
Remote work
Shift work
2 days per week
Midi Health
San Francisco, CA
1 day ago
Remote AI Engineer, Quality & Evaluation at Enterprise Scale
...A pioneering AI technology firm based in San Francisco is seeking an AI Engineer to own the evaluation infrastructure for AI agents. This role requires designing automated pipelines and building observability systems, ensuring agent performance meets enterprise standards...
Remote work
Flexible hours
Fieldguide.ai
San Francisco, CA
3 days ago
AI Evaluation Engineer Data-Driven Contract Intelligence
...Ironclad, located in San Francisco, is seeking an AI Evaluation Engineer to join their team. This role involves analyzing datasets, designing feedback loops, and partnering closely with AI Engineers to improve model quality. Applicants should have 8+ years of experience...
Contract work
Ironclad Inc
San Francisco, CA
4 days ago
AI Engineer (Hybrid - San Francisco, CA)
$175k - $250k
...AI Engineer (Hybrid - San Francisco, CA) We are currently supporting a new client based... ...is building next generation AI powered healthcare workflow solutions. They are looking for... ..., and tool usage. Build evaluation frameworks and feedback loops for model...
Full time
OMG Technologies Inc
San Francisco, CA
12 hours ago
AI Engineer
...AI Engineer Role at Roger Roger is an AI platform that frees home health clinicians from... ...of their homes. Backed by leading healthcare investors like SignalFire, we've powered... ...~ Experience training, fine-tuning, or evaluating LLMs and open source models, with real...
Remote work
Work from home
Flexible hours
Roger Healthcare
San Francisco, CA
1 day ago
Staff/Senior Agentic AI Engineers (Multiple roles)
$180k - $215k
...technology. The flagship product—an AI-driven, non-invasive cardiac... ...for exceptional strides in healthcare innovation, is supported by... ...and greenfield product engineering. You won't just be consuming... ...Implement advanced guardrails, evaluation frameworks, and reasoning validation...
Local area
Worldwide
Relocation
HeartFlow
San Francisco, CA
1 day ago
Scientific Lead - Forward Deployed AI Engineer, Applied Intelligence for Discovery
$166.5k - $266.2k
...around the world. We are a global healthcare leader headquartered in... ...something unprecedented — an AI foundation that will push the... ...repeatable system standards and evaluation practices that scale across... ...areas. The Forward Deployed AI Engineer is the connective tissue...
Full time
Flexible hours
Eli Lilly
San Francisco, CA
3 days ago
Scientific Lead, Generative AI Engineer, Applied Intelligence for Discovery
$181.5k - $283.8k
...Generative AI Engineer At Lilly, we unite caring with discovery to make life better for... ...people around the world. We are a global healthcare leader headquartered in Indianapolis,... ...deployments into repeatable system standards and evaluation practices that scale across therapeutic...
Full time
Flexible hours
Eli Lilly
San Francisco, CA
3 days ago
Applied AI Engineer
...About Luminai Healthcare operations have always depended on people... ...By delegating to autonomous AI systems those mission-... ...the role As a Software Engineer working on AI systems, you will... ...benchmarks Design and improve evaluation frameworks to accelerate the...
Work at office
Worldwide
3 days per week
Luminai, Inc
San Francisco, CA
12 hours ago
Senior Applied AI Engineer
...Care is building the leading AI-native platform for family-led... ...transparency across the healthcare system. Abby Care combines... ...looking for a Senior Applied AI Engineer to build production-grade AI... ...internal platforms. Design evaluation frameworks, datasets, metrics...
Full time
Abby Care
San Francisco, CA
3 days ago
AI Benchmarking Engineer — Evaluations & Failure Analysis
A cutting-edge AI firm in San Francisco is seeking a Research Engineer to develop evaluation systems and benchmarking pipelines for language models. Candidates should have a strong background in applied research, coding skills, and familiarity with ML models. You will work...
Mercor
San Francisco, CA
4 days ago
AI Evaluation Engineer for Coding Agents
...© 2025 Repovive, Inc. All rights reserved. Back to Jobs Apply Now Compensation Not listed Posted April 25, 2026 Required Skills AI evaluation data pipelines agent instrumentation Requirements Mid/Senior Visa Sponsorship Not mentioned Relocation Not mentioned About the Role...
Relocation
Visa sponsorship
Repovive, Inc.
San Francisco, CA
4 days ago
Senior AI Evaluation Engineer — Metrics & Data Pipelines
$240k - $280k
A leading software monitoring company is seeking a Senior Software Engineer on its AI/ML team to build evaluation infrastructure for measuring the performance of AI systems. This role involves designing datasets, creating benchmarks, and ensuring AI features behave reliably...
Sentry
San Francisco, CA
1 day ago
AI Full Stack Engineer
...Help Build the Future of Healthcare AI Infinitus is AI communications in service of patients... ...We are looking for a Backend Software Engineer to help us design, build, and scale the... ...human-centered product experiences. Evaluate and integrate AI technologies,...
Temporary work
Work at office
Infinitus LLC
San Francisco, CA
1 day ago
Senior AI Engineer - Voice & Agentic Systems
...beneficiaries with a dedicated healthcare advocate who navigates... ...support of caring nurses while AI agents handle the tedious backend... ...We're looking for an AI engineer to own the loop that turns real... ...learning pipeline. Set up the evaluation infrastructure that measures...
Full time
Mira Mace
San Francisco, CA
2 days ago
Applied AI Software Engineer
...medical records (EMR) and payments development platform for healthcare. We build modern, elegant front- and back-end tooling... ...Hers Health). The Role We’re hiring an Applied AI Software Engineer to lead evaluations for agents in development and the post-deployment...
Remote work
Home office
Flexible hours
Canvas Medical
San Francisco, CA
4 days ago
AI Model Behavior Engineer—Quality & Evaluation
...located in San Francisco is seeking an innovative Quality Engineer for their AI products. This role blends ops, strategy, and analytics to... ...leading labs, and ensure user satisfaction through effective evaluation baselines. Competitive salary and benefits offered, with a...
Notion
San Francisco, CA
1 day ago
Staff AI Evaluations Engineer — Open Foundation Models
B Capital seeks a talented individual for an AI Evaluation role in San Francisco. This position involves conducting critical comparative analysis, refining evaluation systems, and collaborating with various teams to enhance model capabilities. The ideal candidate will have...
B Capital
San Francisco, CA
1 day ago
AI Model Evaluation Engineer — Benchmarking & Validation
A fast-growing AI company seeks a Software Engineer to focus on Model Evaluation & Benchmarking. This role involves building evaluation systems for multimodal AI, ensuring reliable performance. The ideal candidate will possess strong Python programming skills, familiarity...
SpreeAI
San Francisco, CA
4 days ago
AI Risk & Fraud Evaluation Engineer
...A technology firm in San Francisco is seeking a Research Engineer to enhance AI model quality. The ideal candidate will build benchmarks, datasets, and evaluation loops to ensure effective performance on critical tasks. This role requires strong programming skills and...
Variance
San Francisco, CA
4 days ago
AI Coding Systems Engineer: Full-Stack Evaluator
...Nerdleveltech is seeking a Software Engineering evaluator based in San Francisco, California. In this contractor role, you will create datasets for training and evaluating AI models by curating code examples and refining AI-generated solutions across various programming...
For contractors
10 hours per week
Flexible hours
Nerdleveltech
San Francisco, CA
3 days ago
Applied AI Research Engineer — RAG & Evaluation
$192k - $237.1k
A leading compliance software company in San Francisco is seeking an Applied AI Engineer to innovate compliance automation through applied research and evaluation. This role emphasizes experimentation over production engineering, requiring strong skills in information retrieval...
Drata
San Francisco, CA
12 hours ago
Senior AI Research Engineer: RAG, Evaluation & GenAI
Cacheflow is seeking a Senior Applied Research Engineer to enhance AI systems through rigorous experimentation and applied research. This research... ...The individual will design information access strategies, evaluate innovative methodologies, and collaborate closely with...
Flexible hours
Cacheflow
San Francisco, CA
1 day ago

Do you want to receive more vacancies?

Subscribe and receive similar vacancies to AI Evaluations Engineer - Healthcare. Be the first to apply!