Why is it so difficult to successfully get AI adopted into clinical care?

By Jessie Hobb On Mar 30, 2023

Photo by National Cancer Institute on Unsplash

A look into a scientific review paper that asked that question and found answers

Artificial intelligence (AI) is becoming increasingly prevalent in our daily lives. From recommendation systems in almost every webshop, to automatic translation of foreign languages on websites you visit. For some industries this transition seems to be going more smoothly than for others though. The medical field seems to be especially challenging to enter, but why? There is so much academic activity dedicated to AI in the medical space, so what’s keeping these technological breakthroughs from having tangible impact in healthcare? Sendak et al. tried to find an answer to that question in their review paper “A path for translation of Machine Learning Products into healthcare delivery” (2020). Their findings really resonated with what I know from experience at UbiOps in working with MedTech startups, so in this article I will walk you through their paper.

Before we dig into the paper itself, let’s have a quick look at the current state of machine learning in the medical field. Enthusiasm and excitement about the possibilities of machine learning in the medical field have been immense, leading to an astonishing amount of literature on the topic. Every other week you can read about a new study that has been done on using ML for cancer detection, and using ML for drug discovery also appears to be a hot topic. A multitude of conferences, organizations and academic journals have been set up to disseminate knowledge surrounding the very topic of ML in healthcare.

Even though the research is proliferating fast, evidence of tangible clinical impact remains scant. You might think “Oh but it always takes a while before new technologies become mature enough to be applied in practice”, but at the same time we can see that new findings on using ML for better user retention are quickly adopted by the likes of TikTok, Instagram and LinkedIn. Panch et al eloquently describes this ‘inconvenient truth’ of machine learning in healthcare as

“at present the algorithms that feature prominently in research literature are in fact not, for the most part, executable at the front lines of clinical practice.”

Luckily there are some healthcare companies that have successfully managed to integrate AI/ML into their products. Companies like Ellogon who help doctors in selecting the right patient for cancer immunotherapy show that it is possible to make the transition from proof of concept to a mature product that can easily be fully integrated into existing medical protocols.

What is it that differentiates the ML products that have been successfully integrated into healthcare, from the ones that never make it beyond that proof of concept phase? Let’s have a look at the research from Mark Sendak et al. to find out.

Mark Sendak and his colleagues set out to perform a narrative review that could help understand how to translate machine learning into healthcare. They combined their own first hand experiences in building machine learning products with 21 case studies of machine learning models that successfully made their way into clinical care. And this is exactly what I find so interesting about their research, the fact that they tried to learn from those that actually made the step into production.

Based on their analysis of these 21 case studies, the authors identified the core phases and challenges when moving to a mature product in the healthcare world.

The authors managed to map all the 21 success stories back to what they refer to as “The translational path” (see figure). They note that, in going from proof of concept to proper product with ML, there are four key phases. These phases are:

Design and develop: The process of identifying the right problem to solve, and designing and developing a Machine Learning tool that can create actionable insights.
Evaluate and validate: Evaluate whether the product can actually improve clinical care and patient outcomes, whether it is accurate and reliable, and whether there is a business case to be made for the product.
Diffuse and scale: This step describes the process where a prood of concept is truly scaled up to an integrated product. It requires scaling the deployment of the model and diffusing it to early adopters.
Continuous monitoring and maintenance: It’s important to note that no ML product is ever finished. The models need to be continuously monitored and updated to avoid faulty behavior. Especially in healthcare the latter can have serious repercussions.

These phases are not necessarily sequential, and teams can find themselves moving back and forth between them in an iterative fashion. See the figure below for more details on the translational path.

*Image by Sendak et al. taken from “A Path for Translation of Machine Learning Products into Healthcare Delivery”. Image describes the various phases of the translational path.*

The review does a great job of describing numerous challenges and frustration points when it comes to creating ML powered products in healthcare, from technical infrastructure challenges to ethical risks. I will not go through all of them but I want to highlight a few points that I recognised.

Domain knowledge versus productionisation knowledge

When developing medtech tools there is always this tension between domain knowledge and productionisation knowledge. You can only have so many people in your team, and at the same time you need to ensure that enough medical experts are involved, but also the right people who can actually build and deploy your solution. Where to put the focus is highly dependable, but Sendak et al do a great job of highlighting the importance of having both these capabilities in your team in some way if you want to be successful.

Of course not every skill set needs to be represented by an actual person on the team! Certain things can also be outsourced, or tools can be brought in that take care of the standard tasks so experts can focus on what’s unique to your solution. I see so many companies getting swept up in building their own platforms with a bunch of open source tooling because that’s free. But let’s not forget the costs associated with having all the people on your team that have to invest time and energy into setting up all of that tech! While they are busy trying to get a deployment tool working, you are losing time that you could have spent on actually improving your model and driving value…

When is a model “good”?

It’s often unclear what differentiates a good model from a bad one, and what performance you should be striving for in your specific case. If left undiscussed, this can lead to a mismatch in expectations and reality. Important to note here is that this doesn’t only concern model accuracy, but also usability and potential economic performance. A model that has amazing accuracy, but takes 10 hours to run, will probably not be very useful, nor affordable. Every case is different, and it’s key that a conversation is had early on to identify and agree on the relevant model performance metrics.

Demonstrating validity of the product in an isolated context is not enough

Just because the product performed great on controlled test environments and test datasets, does not mean that the product will perform well in a real-life setting. It’s important to get real life data fed through the product, and to use it to assess its performance. At UbiOps we focus on deployment and serving and we have seen many times that performance can change tremendously after the model is introduced to actual production data! It’s important to get to that stage early, even if it’s just as a shadow deployment.

Integration into production environments is difficult

The authors note that there is often a massive difference between the actual production environment and the development/sata storage environments. In all the example cases they investigated they found that often significant effort and investment were needed to integrate products into the existing systems. One study estimated the cost to validate and integrate the Kidney Failure Risk Equation into clinical workflows at a single site at nearly $220,000. That’s only a single site!

Data is spread across cloud and OnPremise environments

An important issue that this review brings to light, is the fact that data is spread out across various cloud solutions and on premise data centers. This typically starts causing issues in the scaling out phase. Being aware of this while you design your product and architecture can greatly benefit the transition from proof of concept to properly rolled out product.

Continuously changing regulatory frameworks

Another major challenge relates to compliance, data security and the huge amount of regulation and required certification for medical devices and software. Not to mention the fact that the rules are continuously changing. It’s difficult to stay on top of everything and make sure that every part of your product is fully compliant. Data security is especially an obstacle, as the data is so sensitive.

What now?

I’ve walked you through the translational path and its main obstacles, so what now? I think it all starts with awareness and open discussion when you set out to create a new ML powered product in the healthcare sector. Familiarize yourself with the challenges of those that went before you, how can you learn from their mistakes?

In my opinion the most important thing is to not be afraid of actually getting to that diffuse and scale step. It’s absolutely crucial to get out of that development environment and run things in production, albeit in a shadow mode. Only after making that step can you start moving towards a product that actually has impact and value.

So how do you make sure that you can actually run things in production? Well, make sure that you invest in an infrastructure that helps you in iterating quickly and that allows you to focus on what you’re good at: building models for medical use cases. Investing in MLOps tools that can help you deploy quickly, and monitor what’s going on, can really help in making sure that you can focus on the actual challenges at hand, rather than standard infrastructure challenges. I see so many companies getting swept up in building their own platforms with a bunch of open source tooling because that’s free. But let’s not forget the costs associated with having all the people on your team investing time and energy into setting up all of that tech! While they are busy trying to get a deployment tool working, you are losing time that you could have spent on actually improving your model and driving value…

When it comes to data security, it can help to ensure that the tooling that you are using already has the right certifications (like ISO certifications). Specifically in Europe it can also make sense to work with more niche cloud providers rather than the big three. Working with a cloud provider specialized in medical data you will have a lot less headache in proving that you are compliant with all the rules.

There are many factors at play when it comes to getting AI adopted in routine clinical care. From regulations, to architecture problems, and even just getting the right people involved. Sendak et al. managed to capture all the phases and obstacles succinctly in their “translational path”. Being aware of the four distinct phases of this translational path and the different obstacles that might come your way will definitely help in setting you up for success.