Founding Machine Learning Engineer

Medical Imaging AI Evaluation, Reliability \& Evidence Infrastructure

About the Opportunity

My client is building the infrastructure layer for evaluating and validating safety-critical AI systems. As AI becomes increasingly embedded in clinical workflows, benchmark performance alone is no longer enough. Healthcare providers, regulators, insurers, and patients need evidence that AI systems behave reliably across real-world environments, populations, scanners, and workflow

s.This company is working with leading medical imaging AI organizations and healthcare institutions to redefine how AI validation is performed, moving beyond static testing towards continuous evidence generation and monitoring.

Their goal is to build the systems, methodologies, and tooling that allow organizations to understand how models behave in practice, identify risk, and generate defensible evidence for deployment and regulatory decisions.

The role:

This is not a traditional machine learning engineering role.

You will not spend your time simply training models or chasing benchmark improvements.

Instead, you will investigate how AI systems behave in real-world environments, determine where validation approaches break down, identify sources of risk, and help define what evidence is required to support safe deployment.

The work sits at the intersection of:

Medical Imaging AI
Machine Learning
Evaluation Model
Robustness \& Reliability
AI Safety \& Validation
Regulatory Evidence
Generation Software Engineering

As one of the earliest technical hires, you will play a key role in shaping both the product and the methodology used to evaluate safety-critical AI systems.

Investigate Model Behaviour \& Performance

Design and execute evaluations for medical imaging
AI systemsAnalyze performance across populations, institutions, scanners, imaging protocols, and clinical workflows
Investigate failure modes, robustness limitations, and generalization gaps
Evaluate distribution shift, demographic bias, subgroup performance, and deployment risks
Produce evidence that supports, challenges, or refines claims about model performance and safety

Develop AI Validation Methodology

Define frameworks for structuring claims, arguments and evidence
Determine what evidence is sufficient for deployment, regulatory, and clinical
decision-making
Challenge assumptions and identify weaknesses in existing validation approaches
Transform recurring investigations into repeatable workflows and reusable methodologies
Help establish best practices for evaluating safety-critical AI systems

Build Product \& Evaluation Infrastructure

Write production-quality Python code supporting evaluation workflows
Develop reusable investigation pipelines and benchmarking frameworks
Build agentic workflows that automate evidence generation and analysis
Prototype customer-facing functionality using modern AI development tools
Collaborate with customers, researchers, clinicians, and regulatory stakeholders

Founding Machine Learning Engineer

Job Description

Looking for more opportunities?