GUEDHS
The GUEDHS project will underscore the significance of collaborative European health initiatives and data-sharing frameworks, particularly those advocated by the European Health Data Space (EHDS).
Promptly will bring a federated learning framework, while Instituto Pedro Nunes (IPN) will adapt a cybersecurity tool, for fast deployment of a Federated Network in action. This solution will enable data custodians to grant and revoke permissions on the data they control, and monitor the data used by Federate Learning (FL) tasks at the different data nodes.
The testing data partners — CHUC and CHUdSA – will pilot-test GUEHDS solution within a clinical study on the epidemiological trends of respiratory viruses, namely the epidemiology of the co infection with respiratory syncytial virus (RSV), flu (Influenza), and COVID-19 — the so-called Tripledemic.
This pilot will establish the ground for the Portuguese Observatory for Respiratory Diseases, an initiative that can be scaled at a European level together with Regulators (EMA) and Life Science Companies.
- Motivation for the project: Healthcare providers (data custodians) harbor valuable data from patients (data owners). In the absence of good data-sharing models, this data is not fully explored beyond a direct patient benefit (primary use of data), which restrains innovations driven by its secondary use (e.g. assisted clinical management decisions, potential new treatments). Lack of Security, Transparency and Control over the visited data is a primordial challenge to data custodians, whether in the scope of a specific clinical study or in the implementation of a whole data space.
- Generic use case description: Healthcare providers revealed that they face problems of data duplication, long-term retention, and intensive data collection beyond the data minimization aspect. Furthermore, to spot emerging trends and act within desired timeframes in critical situations, the traditional centralized process simply does not work. FLis an instrumental technology in enabling the analysis of critical epidemiological risk indicators. With our FL infrastructure, federated computations will run at the hospitals’ premises to retrieve information such as: number of hospital admissions, length of stay, beds at intensive care yards, patient demographics, lab data, clinical exams (imaging and functional studies), -omics data (mapping genomic profiles to identify viral strains). When available, wearables (e.g. respiratory rate data) and patient outcomes data will be used. The system will return aggregated results to monitor circa half of the Portuguese population . Controlling what data is shared, with whom, and for what purpose then becomes an important challenge to consider. The solution will allow for the secure definition of data access permissions, enabling data custodians to manage their consent to provide data for specific researchers (data processors). To ensure transparency, adequate auditing information regarding permission assignment as well as data access operations will be stored in a distributed ledger. A monitoring dashboard will allow the visualization of audit data.
- Essential functionalities: Promptly already provides several technological building blocks to the centralized data value chain, covering data acquisition, preprocessing and and analysis. With this project, we will expand to the decentralized FL universe. We will leverage our recent developments in FL frameworks and the NGI Zero Entrust project. Establishing a Federated Network on its own is a way to assure security and privacy over data centralization. However, there is a multitude of organizational, technical and economic issues relating to delivering and maintaining infrastructure, cybersecurity, implementation and enforcement of governance and standards, and management of data. Our innovative strategy will: 1) adapt the above- mentioned technological building blocks; 2) deploy existing state-of-the-art FL tools; 3) refine an existing governance framework and 4) evolve an existing solution for managing consent. The final GUEHDS solution will bring data sovereignty and trustworthiness to the healthcare setting, while attempting to reach a novel end-to- end solution for FL.
- How these functionalities can be integrated within the software ecosystem: In our current systems, data is ingested from multiple formats and sources (HL7, Databases, CSV files), and subsequently standardized to the OMOP common data model. In this project, using this harmonized data layer as a foundation, we will generate datasets that can seamlessly integrate into the decentralized FL universe, thereby enabling a highly functional data network to be built on standard concepts. PySyft5 is an open- source library for secure and private data science. Its flexibility and support for general data computation over federated datasets make it a suitable tool to use as the basis for our Federated Network. We will build upon the PySyft tool to add auditing capabilities and consent management functionality. Currently, the tool already implements a system for computation authorization, but it has no concept of rule-based access, which we will develop. Data provenance is a critical enabler of security and privacy when actions need to be attributed in distributed systems. Thus, we will refine the PROV-DM model for distributed systems to be applied to a federated system and supervise data access, permission granting and detect anomalies. We will achieve this by mapping actions and concepts from the context of healthcare data governance and federated networks to PROV-DM entities and relationships.
- Gap being addressed: Currently, there are no end-to-end solutions capable of combining Electronic Health Records (EHR) with other data sources, for conducting longitudinal and real-time studies, in a transparent, secure, and privacy-preserving manner. This requires a combination of soft and hard infrastructures that need to be harmonized into plug-and-play solutions. Such need was visible during the recent COVID-19 pandemic where 40% of healthcare organizations were not able to share data. This hinders any expectation of quickly identifying new epidemic trends, signaling a first failure in the implementation of data spaces. Moreover, if only larger and wealthier entities have the capability to access and explore health data spaces, it creates social and ethical imbalances, which fails on our common vision that all citizens shall benefit 8 from equal access to healthcare in Europe.
- Expected benefits achieved with the novel technology building blocks: By playing a good example of a small-scale EHDS pilot implementation, GUEHDS will contribute to an European and global trend in Data Governance and Privacy Compliance and to the valorization of the industrial data value chain. On one hand, we will lift the demand side by ramping up with an imminent population health problem, thus raising societal awareness and depicting the potential of the secondary use of health data. Simultaneously, by co- creating, optimizing and implementing leading- edge technologies with healthcare providers, we will lower their adoption barriers thus expanding the supplier side of the data value chain. Altogether, this directly forces other companies to improve their products and services, which indirectly introduces new market players, creates data 9 marketplaces and new partnerships between distinct verticals forming potent development grounds for novel AI solutions.
- Potential demonstration scenario: We considered three stakeholder groups with different purposes: 1) Healthcare providers: Physicians who will see greater accuracy of diagnosis with personalized algorithms; Hospital administrators who will have a simplified process to incorporating heterogeneous data, and quality control tools for data transfer and automatic data mapping; 2) Research workers: Clinical investigators who will use our platform in their daily practices; Engineers who will assemble specialized teams to provide infrastructure installation and maintenance; Data specialists who will develop smoother and continuous integration workflows; 3) Public entities and industrial partners: Patient Associations representing patient perspectives on 10 improving their diagnosis; Governments who will provide updated and accurate data; Businesses with more market opportunities. For an effective knowledge transfer, we will participate in (inter- )national activities aligned with the GUEHDS project.
Team
António Bezerra
MSc in Computer
Engineering,
specialist in
Federated Learning
in Healthcare, and
software
engineering.
Inês Lopes Teixeira
MSc in Biomedical
Engineering,
specialist in
healthcare project
management.
Paulo Silva
Ph.D. in
Information
Science and Technology, Expert
in Machine
Learning, Data
Privacy Protection
and Security
Services for Cloud
Services and IoT; 15+
publications.
Carlos Junior
MSc in Informatics
Engineering,
specializing in
software
engineering,
blockchain and ML.
José Diogo Gaspar
MSc in Informatics Engineering, specialized in Intelligent
Systems, with strong interest in AI/ML, XAI, Cybersecurity for
AI/AI for Cybersecurity, and Cloud Computing
Entities
Promptly Health
A Real-World Data
(RWD) Analytics
company
delivering
technological
solutions for
collecting,
integrating and
analyzing data on
the outcomes of
care.
Website: https://promptlyhe
alth.com/en
Instituto Pedro Nunes (IPN)
A private non-profit
organisation which
promotes
innovation and the
transfer of
technology,
establishing the
connection
between the
scientific and
technological
environment and
the production
sector.
Website: https://www.ipn.pt/
Centro Hospitalar e Universitário de Coimbra (CHUC)
A leading
academic medical
center in Portugal
and the
coordinating
institution of the
Portuguese Node
of the OHDSI –
Observational
Health Data
Sciences and
Informatics.
Website: https://www.chuc.
min-saude.pt/
Centro Hospitalar Universitário de Santo António (CHUdSA)
A Portuguese
hospital renowned
for its expertise in
clinical care and
research, included
in 11 European
Reference
Networks (ERNs).
Website: https://www.chusa
ntoantonio.pt/