Federated learning: Healthcare AI that doesn’t risk patient data privacy

Bardia M. Zanganeh

Traditional machine learning techniques pose a risk to patient data privacy, blocking the general progress of AI in healthcare and life sciences. Enter federated learning: an emerging technique that offers both cost-saving and security benefits.

Insight in brief

  • Traditional machine learning techniques pose a risk to patient data privacy, which puts a barrier on the general progress of AI in healthcare and life sciences
  • Federated learning is an emerging technique that aims to get around this issue by keeping patient data private, at source, while sharing only outcomes that don’t contain personal information
  • This is a huge opportunity for healthcare and life science brands with a wealth of data that, if used in the right ways, could use the insights they generate to deliver much more effective solutions
AI is only as good as the humans programming it, and the system in which it operates. If we are not careful, AI could not make healthcare better, but instead unintentionally exacerbate many of the worst aspects of our current healthcare system.
Prof. Bob Kocher
Stanford University

The problem with traditional machine learning in Healthcare

To explain what federated learning is and why it works so well in a healthcare setting, we first need to explain why the traditional machine learning model doesn’t work so well…

In 2017, IBM’s Watson – one of the most famous applications of AI in healthcare – reportedly prescribed a drug that could have killed a patient during a simulation. Internal IBM documents revealed the supercomputer had “often” recommended “unsafe and incorrect” cancer treatments, while customer assessments said the event posed “serious questions about the process for building content and the underlying technology.”  (IBM contests these allegations).

How did this happen?

The documents blame poor training of Watson’s software, saying it only used a limited set of “hypothetical” cancer cases instead of real patient data, and that it followed recommendations from just a handful of specialists instead of verified “guidelines or evidence”.

So why wasn’t Watson using a wealth of pre-existing, real-world patient data? In fact, why wouldn’t every healthcare and life science brand today ensure they always use a sufficient amount of real-world patient data in AI applications?

Welcome to one of the biggest paradoxes in healthcare today: despite the potential for AI to truly transform healthcare and make life better for patients, the ability to access and use patient data in the right ways at scale is a hugely complex challenge for a traditional machine learning model.

First of all, medical data has unique properties that make it extremely hard to integrate – for example, medical data types range from free-text clinical notes to heterogeneous medical images. Then there’s the fact that medical data sources themselves are siloed across disparate, incompatible systems. Combine these challenges with the privacy concerns around the use of patient data – data which has the highest requirements for protection and anonymization – and you can start to see why it’s so hard for AI systems, like Watson, to achieve sufficiently large, real-world data sets that have a shared responsibility towards the data owner: the patient.

The biggest barrier of all? Traditional machine learning involves a data pipeline that pools all the data at a central server. That’s not necessarily a problem in other settings, but the downside of this architecture in a healthcare setting is that it requires all the data collected by local devices and sensors to be sent back to the central server for processing – which means all those sensitive, highly protected data assets suddenly have to be shared.

In a nutshell, it’s a stalemate.

2 diagonalestriche lightgray

Divide and conquer: how federated learning changes everything

Federated learning is an emerging machine-learning technique that gives devices the power to learn collaboratively from a shared model. It works by allowing the individual training of models on distinct, isolated datasets, while sharing only the trained models which no longer contain any personal information. The devices then send their specific models to a centralized server, where the models are then averaged to obtain a single, combined model. This repeats for several iterations until a high-quality model is obtained.

How federated learning keeps patient data private
federated learning applications currently in play

The benefits of this new machine-learning paradigm are huge.

  • First of all, the models themselves will be smarter in virtue of the collaborative training process, which means better decisions down the line.
  • Second, latency will be lowered, since the new model will be able to make predictions locally.
  • Third, healthcare brands won’t actually have to be near the data – or of course, the patients themselves – to get the insights they need to develop better solutions. Insights will be derivable from anywhere.
  • And the biggest benefit of all? Patient data privacy will have been maintained throughout, since the original, identifiable data itself will always be restricted to isolated edge devices.  

Which federated learning applications are in use today?

Even though federated learning is relatively new, and has yet to prove its validity in a productive and regulated environment, there are a few healthcare giants pioneering the technology.

For example, the ACR Data Science Institute is piloting NVIDIA Clara FL – a federated learning framework for medical devices – in “AI-LAB”, a data-science toolkit designed to democratise AI by giving radiologists the power to develop algorithms at their own institutions – using their own patient data.

And machine-learning startup Owkin (the company behind game-changing initiatives like Melloddy) has rolled out a new federated learning platform called Owkin Connect – giving data owners the power to define their own data authorizations and track their data usage: meaning an unforgeable ledger tracks what data is used by which model for training, and how that data contributes to the parameters of the mode.

2 diagonalestriche lightgray

What’s next for federated learning?

These are super early days, and some important questions remain about how Pharmaceutical and Life Science companies can be sure they’re applying federated learning safely. For example, when a central server aggregates a group of trained algorithms, what is best practice from a process perspective?

When it comes to questions like this, every company and even every department is unique. That’s why we would like to hear from you: What are your challenges? What are your goals? And how would you like to achieve them? 

Bardia Zanganeh

Bardia M. Zanganeh

Senior Business Development Manager
Contact person for Switzerland Bardia.Zanganeh@zuehlke.com +41 43 216 6788

Bardia M. Zanganeh is responsible for the Life Sciences and Healthcare practice in Switzerland. He serves leading healthcare institutions on all technology agenda issues. His primary areas of focus include digital innovation, business model transformation and product innovation. He also serves providers as well as medical technology and pharmaceutical companies. He has a background in engineering, consulting and entrepreneurship and is a lecturer at the University of Applied Sciences in Business Administration in Zurich.