Founding Machine learning Engineer - Evaluation
Established Search
Senior ML Engineer Medical Imaging Evaluation & AI Reliability
About the Role:
My client is building evaluation and evidence infrastructure for safety-critical AI systems, starting with diagnostic medical imaging.
AI systems are increasingly used in settings where their outputs affect clinical decisions and patient outcomes. In medical imaging, benchmark accuracy alone is not enough. Hospitals, regulators, and clinical stakeholders need evidence that models will behave reliably across real-world deployment environments, populations, scanners, and workflows.
This role sits at the intersection of:
- medical imaging AI,
- model robustness and evaluation,
- regulatory evidence generation,
- and real-world deployment behavior.
The work is highly investigative and requires strong technical judgment, scientific reasoning, and the ability to operate effectively in ambiguous environments.
The Role
This is not a traditional “train models on benchmark datasets” ML role.
You will work directly with medical imaging companies and healthcare stakeholders to investigate how AI systems behave in practice and what evidence is required for deployment, regulatory, and clinical decisions.
You will:
- Design and execute evaluations for medical imaging AI systems
- Investigate model failure modes, robustness, and generalization gaps
- Analyze behavior across populations, scanners, imaging protocols, and clinical settings
- Determine what evidence is sufficient for stakeholders making deployment or regulatory decisions
- Translate technical findings into actionable recommendations for customers and clinical stakeholders
- Build reusable evaluation pipelines, evidence schemas, and model assessment frameworks
- Work with messy, incomplete, and noisy real-world clinical data
- Help shape how evaluation investigations are conducted across the organization
The important work is not simply running experiments. It is identifying what questions actually matter, what evidence is missing, and how to generate defensible conclusions under real-world constraints.
Required Qualifications:
- Strong experience in machine learning for medical imaging (radiology, pathology, cardiology imaging, or related domains)
- Experience evaluating or validating real-world ML systems, not just training models
- Deep understanding of:
- model robustness,
- distribution shift,
- uncertainty,
- failure analysis,
- and real-world deployment behavior
- Strong Python skills across the full investigation workflow:
- data analysis,
- experimentation,
- evaluation,
- and reporting
- Experience working with noisy or imperfect clinical datasets
- Ability to communicate technical findings clearly to both technical and non-technical stakeholders
- High tolerance for ambiguity and open-ended investigative work
Strongly Preferred:
- Experience with FDA-regulated AI/ML systems or medical device submissions (510(k), De Novo, SaMD, etc.)
- Experience with medical imaging deployment evaluation or clinical validation
- Experience with interpretability, post-deployment monitoring, uncertainty estimation, or model auditing
- Experience designing reproducible evaluation frameworks or benchmarking systems
- Background in healthcare AI or other safety-critical ML domains
- Customer-facing or cross-functional technical leadership experience
- PhD or equivalent research depth in ML, medical imaging, computer vision, or related areas
Ideal Candidate Profile
Candidates who tend to succeed in this role often come from backgrounds such as:
- Medical imaging ML research
- FDA or healthcare AI evaluation
- Clinical AI validation
- AI robustness and reliability research
- Applied ML investigation in safety-critical environments
- Healthcare-focused computer vision research
What Success Looks Like:
The strongest people in this role become experts in how medical AI systems behave in the real world.
They develop the judgment to answer questions such as:
- Where are the model’s true weaknesses?
- Which deployment conditions introduce risk?
- What concerns are real versus theoretical?
- What evidence is sufficient for a hospital or regulator to trust the system?
- What additional validation is required before deployment proceeds?
$120k - $235k
...most innovative companies to build strong engineering teams ready for what's next. Software... ...About the role How developers were evaluated previously was whether they can write functionally... ..., target bonus, and equity. Want to learn more about HackerRank? Check out...SuggestedShift work$170k - $216k
...across 15+ U.S. states. The DUE Machine Learning team will build and operate scalable machine... ...tools, improve and speed up the evaluation and onboard developer journeys. It will... ...looking for researchers and software engineers who are passionate about developing machine...SuggestedFull time$238k - $302k
...Waymo AI Foundations team is to develop machine learning solutions addressing open problems in... ..., hierarchical learning, and robust evaluation. This role follows a hybrid work schedule... ...report to a Senior Staff Software Engineer. You will: Work with a...SuggestedFull timeRemote work$204k - $259k
...states. The Driver Understanding and Evaluation (DUE) team at Waymo is developing rich... ...of the Waymo Driver. The DUE Machine Learning team will build and operate scalable machine... ...looking for researchers and software engineers who are passionate about developing...SuggestedFull time$238k - $302k
...states. The Driver Understanding and Evaluation (DUE) team at Waymo is developing rich... ...of the Waymo Driver. The DUE Machine Learning team will build and operate scalable machine... ...looking for researchers and software engineers who are passionate about developing...SuggestedFull time$60 - $70 per hour
...Overview: We are seeking a Machine Learning Engineer to join a high-impact team focused on advancing LLM evaluation, NLP, and AI-driven automation. This role centers on designing scalable evaluation frameworks, optimizing prompt strategies, and building systems that...Contract workTemporary workRemote work3 days per week- ...Description We are seeking an experienced GenAI engineer to join our seasoned founding team to drive the development and innovation... ...distributed infrastructure to support machine learning training, inference, and evaluation. Hands‑on contributor and overseer of GenAI...
- ...Weekly Hours: 40 Role Number: 200657970-0836 Summary The Productivity and Machine Learning Evaluation team ensures the quality of AI-powered features across a suite of productivity and creative applications; including Creator Studio, used by hundreds of millions...Shift work
- ...Number: 200657984-0836 Summary The Productivity and Machine Learning Evaluation team ensures the quality of AI-powered features across a... ...genuinely useful AI outputs Experience partnering with engineering or data teams to define data collection requirements and...
$126.8k - $220.9k
...ML Engineer, Proactive - Agentic Systems Evaluation Are you passionate about working on the next generation of personalized intelligence systems? In... ...personalized experiences by adapting to user behaviors with machine learning running locally on-device or in PCC. Join our cross...Relocation$125k - $201.25k
...impact health for humanity. Learn more at jnj.com As... ...the best talent for Senior Machine Learning Engineer - Robotics to be in Santa Clara... ...hardware. ~ Define and evaluate performance metrics for... ...Additional information can be found through the link below.Work experience placementLocal areaImmediate start$160.5k - $240.7k
...Technologies, Inc. Job Area: Engineering Group, Engineering Group Machine Learning Engineering General Summary:... ...compilation pipelines and CI/CD evaluation harnesses to scale model... ...call Qualcomm's toll-free number found here. Upon request, Qualcomm will...Work experience placementWork from home$157.2k - $254.1k
...Machine Learning Engineer We are seeking a Machine Learning Engineer to join our pioneering security... ...Experience with model evaluation, tuning, and handling imbalanced datasets... ...description of our employee benefits may be found here. $157,200.00 - $254,100.00/yr...$184k - $287.5k
Senior ML Evaluation Engineer - Autonomous Vehicles page is loaded## Senior ML Evaluation Engineer - Autonomous Vehicleslocations: US, CA, Santa... ...behavior evaluation — moving beyond hand-crafted rules to learned evaluation using LLMs, VLMs, and agentic workflows. You'll...Remote work$147.4k - $272.1k
ML Engineer - Automated Evaluation and Adversarial Design Cupertino, California, United States Software and Services The Productivity and Machine Learning Evaluation team ensures the quality of AI-powered features across a suite of productivity and creative applications...RelocationShift work$147.4k - $272.1k
Apple Inc. in Cupertino, California is looking for an ML Engineer to build and scale automated evaluation systems for AI features across various applications. This role involves defining evaluation approaches and designing adversarial and stress-testing methodologies....- ...technology company located in Cupertino is seeking an experienced Machine Learning Engineer to develop data generation methodologies and quality assessment systems. This role involves designing automated evaluation systems and collaborating on data requirements. Candidates...
- ...Lead to join a centralized evaluation organization and define the... ...data quality, and ML systems engineering. You will work closely with... ...modeling, LLM-as-judge, preference learning, and calibration techniques... ...or PhD in Computer Science, Machine Learning, Artificial...
$281k - $356k
...across 15+ U.S. states. The DUE Machine Learning team will build and operate scalable machine... ...tools, improve and speed up the evaluation and onboard developer journeys. It will... ...looking for researchers and software engineers who are passionate about developing machine...Full time- ...intersection of natural language processing, machine learning, and software engineering. We are responsible for the... ...Writing Tools, Summarization, Found In Apps, and Messages/Mail Smart Replies... ...pipelines and contribute evaluation metrics to measure progress. Minimum...
$152k - $277k
...features and build large-scale machine learning models and systems to... ...key performance metrics to evaluate model impact and identify high... ...-functionally with product, engineering, and data science teams to align... ...if a candidate is found to have submitted false information...Temporary workFlexible hours- ...looking for the best At 42dot, our Machine Learning Engineers conduct research and development on... ...n Responsibilities Dataset and Evaluation : We focus on curating high-quality... ...imbalances (long-tail learning) commonly found in autonomous driving datasets....Full time
- ...law. About us Founded in 2017, Wayve is the... ..., constantly learning and evolving as we pave... ...Role As an ML Engineer within the Application... ...architecture, data pipelines, evaluation frameworks, and real-... ...you up for success as a Machine Learning Engineer at...Full timeWork at officeWork from home
$181.1k - $318.4k
...AI/ML - Senior OS Software Engineer, Evaluation For the engineer that obsesses on how software can enable OS developers to evaluate and improve... ...bonuses or commission payments as well as relocation. Learn more about Apple Benefits Note: Apple benefit, compensation...Work experience placementRelocation$120k - $235k
...most innovative companies to build strong engineering teams ready for what’s next. Software... ...as intelligent as the candidates it is evaluating. Open Problem An interview that thinks... ..., target bonus, and equity. Want to learn more about HackerRank? Check out...Shift work$150k
...researchers, data scientists, and engineers, tackling the most... ...performance computing in deep learning, driving impactful discoveries... ...pioneers. The Role As a Machine Learning Engineer at the Institute... ...pre-training, post-training, evaluation and so on, especially...WorldwideVisa sponsorship- ...Candidates only Position Summary Seeking an experienced Machine Learning Engineer to lead the development of prompt injection and prompt... ...drawn from red-teaming and production signals. Build evaluation harnesses that measure attack success rate false-positive...
$120k - $235k
...most innovative companies to build strong engineering teams ready for what’s next. Software... ...across all integrity signals. Build the evaluation infrastructure, golden datasets, and benchmarking... ..., target bonus, and equity. Want to learn more about HackerRank? Check out...Shift work$230k - $280k
...Founding ML Engineer Poesis is building an AI-driven hedge fund focused on reshaping how... ...Founding ML Engineer, the first full-time machine learning hire who will turn research and data... .... Implement backtesting and evaluation frameworks with clear performance metrics...Full timeRelocation package- ...Company Description It all started when engineer Fred Luddy wrote code that automated a... ...experiments Develop metrics to evaluate ranking performance Qualifications... ...traditional information retrieval techniques, or machine learning based ranking models Capable of...Work at officeImmediate startRemote workFlexible hours
Do you want to receive more vacancies?
Subscribe and receive similar vacancies to Founding Machine learning Engineer - Evaluation. Be the first to apply!
- machine learning ai engineer Sunnyvale, CA
- machine learning engineer Sunnyvale, CA
- machine learning software engineer Sunnyvale, CA
- ai ml engineer Sunnyvale, CA
- senior ml engineer Sunnyvale, CA
- computer vision machine learning engineer Sunnyvale, CA
- machine learning research scientist Sunnyvale, CA
- machine learning part time Sunnyvale, CA
- artificial intelligence - machine learning intern Sunnyvale, CA
- machine learning Sunnyvale, CA


