Location
Remote
Salary
Not specified
Type
fulltime
Posted
Today
Job Description
Founding Machine Learning Engineer
Medical Imaging AI Evaluation, Reliability \& Evidence Infrastructure
About the Opportunity
My client is building the infrastructure layer for evaluating and validating safety-critical AI systems. As AI becomes increasingly embedded in clinical workflows, benchmark performance alone is no longer enough. Healthcare providers, regulators, insurers, and patients need evidence that AI systems behave reliably across real-world environments, populations, scanners, and workflow
s.This company is working with leading medical imaging AI organizations and healthcare institutions to redefine how AI validation is performed, moving beyond static testing towards continuous evidence generation and monitoring.
Their goal is to build the systems, methodologies, and tooling that allow organizations to understand how models behave in practice, identify risk, and generate defensible evidence for deployment and regulatory decisions.
The role:
This is not a traditional machine learning engineering role.
You will not spend your time simply training models or chasing benchmark improvements.
Instead, you will investigate how AI systems behave in real-world environments, determine where validation approaches break down, identify sources of risk, and help define what evidence is required to support safe deployment.
The work sits at the intersection of:
- Medical Imaging AI
- Machine Learning
- Evaluation Model
- Robustness \& Reliability
- AI Safety \& Validation
- Regulatory Evidence
- Generation Software Engineering
As one of the earliest technical hires, you will play a key role in shaping both the product and the methodology used to evaluate safety-critical AI systems.
Investigate Model Behaviour \& Performance
- Design and execute evaluations for medical imaging
- AI systemsAnalyze performance across populations, institutions, scanners, imaging protocols, and clinical workflows
- Investigate failure modes, robustness limitations, and generalization gaps
- Evaluate distribution shift, demographic bias, subgroup performance, and deployment risks
- Produce evidence that supports, challenges, or refines claims about model performance and safety
Develop AI Validation Methodology
- Define frameworks for structuring claims, arguments and evidence
- Determine what evidence is sufficient for deployment, regulatory, and clinical
- decision-making
- Challenge assumptions and identify weaknesses in existing validation approaches
- Transform recurring investigations into repeatable workflows and reusable methodologies
- Help establish best practices for evaluating safety-critical AI systems
Build Product \& Evaluation Infrastructure
- Write production-quality Python code supporting evaluation workflows
- Develop reusable investigation pipelines and benchmarking frameworks
- Build agentic workflows that automate evidence generation and analysis
- Prototype customer-facing functionality using modern AI development tools
- Collaborate with customers, researchers, clinicians, and regulatory stakeholders
Looking for more opportunities?
Browse thousands of graduate jobs and entry-level positions.