NFDI4Health workflow and service for synthetic data generation, assessment and risk management
Authors:
Sobhan Moazemi,
Tim Adams,
Hwei Geok NG,
Lisa Kühnel,
Julian Schneider,
Anatol-Fiete Näher,
Juliane Fluck,
Holger Fröhlich
Abstract:
Individual health data is crucial for scientific advancements, particularly in developing Artificial Intelligence (AI); however, sharing real patient information is often restricted due to privacy concerns. A promising solution to this challenge is synthetic data generation. This technique creates entirely new datasets that mimic the statistical properties of real data, while preserving confidenti…
▽ More
Individual health data is crucial for scientific advancements, particularly in developing Artificial Intelligence (AI); however, sharing real patient information is often restricted due to privacy concerns. A promising solution to this challenge is synthetic data generation. This technique creates entirely new datasets that mimic the statistical properties of real data, while preserving confidential patient information. In this paper, we present the workflow and different services developed in the context of Germany's National Data Infrastructure project NFDI4Health. First, two state-of-the-art AI tools (namely, VAMBN and MultiNODEs) for generating synthetic health data are outlined. Further, we introduce SYNDAT (a public web-based tool) which allows users to visualize and assess the quality and risk of synthetic data provided by desired generative models. Additionally, the utility of the proposed methods and the web-based tool is showcased using data from Alzheimer's Disease Neuroimaging Initiative (ADNI) and the Center for Cancer Registry Data of the Robert Koch Institute (RKI).
△ Less
Submitted 8 August, 2024;
originally announced August 2024.
Verbesserung des Record Linkage für die Gesundheitsforschung in Deutschland
Authors:
Timm Intemann,
Knut Kaulke,
Dennis-Kenji Kipker,
Vanessa Lettieri,
Christoph Stallmann,
Carsten O. Schmidt,
Lars Geidel,
Martin Bialke,
Christopher Hampf,
Dana Stahl,
Martin Lablans,
Florens Rohde,
Martin Franke,
Klaus Kraywinkel,
Joachim Kieschke,
Sebastian Bartholomäus,
Anatol-Fiete Näher,
Galina Tremper,
Mohamed Lambarki,
Stefanie March,
Fabian Prasser,
Anna Christine Haber,
Johannes Drepper,
Irene Schlünder,
Toralf Kirsten
, et al. (5 additional authors not shown)
Abstract:
Record linkage means linking data from multiple sources. This approach enables the answering of scientific questions that cannot be addressed using single data sources due to limited variables. The potential of linked data for health research is enormous, as it can enhance prevention, treatment, and population health policies. Due the sensitivity of health data, there are strict legal requirements…
▽ More
Record linkage means linking data from multiple sources. This approach enables the answering of scientific questions that cannot be addressed using single data sources due to limited variables. The potential of linked data for health research is enormous, as it can enhance prevention, treatment, and population health policies. Due the sensitivity of health data, there are strict legal requirements to prevent potential misuse. However, these requirements also limit the use of health data for research, thereby hindering innovations in prevention and care. Also, comprehensive Record linkage in Germany is often challenging due to lacking unique personal identifiers or interoperable solutions. Rather, the need to protect data is often weighed against the importance of research aiming at healthcare enhancements: for instance, data protection officers may demand the informed consent of individual study participants for data linkage, even when this is not mandatory. Furthermore, legal frameworks may be interpreted differently on varying occasions. Given both, technical and legal challenges, record linkage for health research in Germany falls behind the standards of other European countries. To ensure successful record linkage, case-specific solutions must be developed, tested, and modified as necessary before implementation. This paper discusses limitations and possibilities of various data linkage approaches tailored to different use cases in compliance with the European General Data Protection Regulation. It further describes requirements for achieving a more research-friendly approach to linking health data records in Germany. Additionally, it provides recommendations to legislators. The objective of this work is to improve record linkage for health research in Germany.
△ Less
Submitted 14 December, 2023;
originally announced December 2023.