Federated Learning makes protected patient data usable for medical research
In this article we highlight possible use scenarios for the application of federated learning in medical research. The method helps to overcome data biases and allows organisations to share model parameters without giving up data ownership.
Insight in brief
- The Covid-19 pandemic is an impressive demonstration of how quickly decades of industry in-fighting can be abandoned and successful cooperation becomes possible.
- From a technological point of view, new approaches in data science and machine learning provide key building blocks for effective solutions in the healthcare sector.
- In this article we highlight possible use scenarios for the application of federated learning in medical research.
Medicine has made great progress in the past decades. The chances of surviving cancer, for example, are better today than ever. Many infectious diseases that would have prematurely ended a human life in the past, or at least severely impaired it, are now almost extinct. And yet mankind is still far from being able to counteract every disease with a proven remedy. Whether it’s a matter of new virus variants or the thousands of previously incurable rare diseases, in order to achieve further therapeutic breakthroughs, research and development must be intensified and take place collectively across company boundaries.
The Covid-19 pandemic is an impressive demonstration of how quickly decades of industry in-fighting can be abandoned and successful cooperation becomes possible. From a technological point of view, new approaches in data science and machine learning provide key building blocks for effective solutions in the healthcare sector.
The EU-funded project Melloddy – Machine learning ledger orchestration for drug discovery – shows that networking and data exchange in medicine can work in principle and is wanted. The project uses machine learning to increase effectiveness and efficiency in drug discovery. More than one billion private and competition-related items of data on drug discovery are structured, linked and made available for cross-company analysis. Via a secure platform, the ten pharmaceutical companies participating so far can use information from competition-critical data without the data of any one company having to be accessible to the others.
Data protection as a dilemma of medical research
Such cooperation in the use of data across company and country borders can be a powerful lever. But Melloddy is only the beginning and still more the exception than the rule. All players recognise the enormous potential of linking R&D data silos for drug research. But often neither the pharmaceutical companies nor the hospitals or other research institutions are allowed or willing to share their data ownership: because, for example, the laws of their country prohibit it or because they have to protect their own data assets from the eyes of the competition. In case of doubt, data that is important for medical research remains unused. Federated Learning offers a technological solution to this dilemma.
Federated Learning means that machine-learning models are trained and used decentrally, without data having to be "collected" and stored centrally.
Under the motto "bringing code to data", such a procedure was originally developed to relieve central resources for data processing. However, in these times of unlimited computing capacity thanks to cloud computing, computing power itself plays a subordinate role. For medical research, another fundamental aspect of Federated Learning is of greater importance: overcoming "data biases". If more data overall is available for research, any inherent biases of small data sets can be eliminated.
Linking of patient data in hospitals
An example of the impact of data privacy on medical research is the patient data collected and stored in hospitals. This data is valuable for research – especially when viewed together with data from other hospitals. This is because the data volume of an individual hospital is often not statistically significant enough on its own. "Disruptive factors" such as a regionally determined demographic bias (e.g. genetic background) can distort the results. However, sharing the data with many other hospitals or creating a data pool is often out of the question due to the legal restrictions already mentioned or to data privacy considerations.
Federated Learning can solve this problem because it brings the machine-learning model to the hospital system, trains it on site with the local data and then leaves the hospital system again. In this way, only the result – an updated and better trained model – but not the underlying data, is made visible or usable for the operators and "trainers" of the model. Federated Learning thus makes it possible to use data for research without explicitly disclosing it. This means that the raw data can still only be viewed by the clinics themselves, thus protecting the data assets. Ultimately, this also increases the value of individual, locally collected small data sets, because they add value in the context of other participants' data.
Linking of study-centre data for Orphan Diseases
Another use scenario for Federated Learning could be in the research and development of therapies for rare diseases (orphan diseases), of which there are around 5,000 to 8,000 in the EU. Most of them cannot be treated causally - there is no cure yet.
A major challenge for research on rare diseases: patients with rare diseases often live very far apart from one another. Under these conditions, clinical trials are a particular challenge for all those involved. In the context of virtual clinical studies, Federated Learning could also act as an enabler here. In this way, data collected from around the world at the patient's home location and around the clock can be made available for research into new active ingredients and therapies. These small data cohorts can thus be used in parallel in different studies. Federated Learning again ensures data privacy and high-performance use of the results across many companies.
Linking of data from Medical Grade Wearables
The Internet of Things has become an important data source in the healthcare sector. The medical application of Medical Grade Wearables, mostly as a combination of app and hardware, ranges from disease-management solutions through telehealth applications to the treatment of chronic diseases or care of the elderly. Wearables are isolated data islands, however. For data protection and data security reasons, the collected data remains at the level of the app. Furthermore, the computing power of the individual devices is limited.
Federated Learning can help to neutralise these limitations and make the patient data stored on the wearables usable for drug research or the improvement of treatment concepts. However, a prerequisite for the use of Federated Learning in the field of wearables is to overcome the limited computing power of the devices. This could be achieved by dividing the calculation of network layers between the wearable and the central server. This would enable the central server to take over the main computing power but still ensure data privacy.  This would open up a new data source for different parts of the healthcare system and not just for the big tech players in the research and development of new drugs. This could also create new revenue streams for patients through the sharing of their healthcare data.
Zühlke supports you in building your Federated Learning solution
Federated Learning has the potential to become a game changer in the healthcare sector, as it enables new insights to be gained from data that has already been collected and thus enables new, better drugs to be developed. It is clear that Federated Learning is a platform topic that must be approached in an interdisciplinary and far-sighted manner. This concerns technological and procedural as well as organisational and legal aspects.
In order to accomplish a sustainable Federated Learning solution that brings real added value to medicine, a lot of experience in the development and implementation of regulated AI and Medical Grade Machine Learning is required. In particular, it requires the ability to combine cutting-edge know-how from the medical, regulatory and data-science fields, as well as from software and hardware engineering. This is exactly where Zühlke comes in.
Federated Learning makes it possible to extract from across multiple data silos a treasure trove of data that ultimately benefits patients – without having to compromise on security and data privacy.
 Yuan, B., Ge, S., Xing, W. (2020). A Federated Learning Framework for Healthcare IoT Devices. arXiv preprint arXiv:2005.05083, 2020 - arxiv.org