Verbesserung des Record Linkage für die Gesundheitsforschung in Deutschland
Authors:
Timm Intemann,
Knut Kaulke,
Dennis-Kenji Kipker,
Vanessa Lettieri,
Christoph Stallmann,
Carsten O. Schmidt,
Lars Geidel,
Martin Bialke,
Christopher Hampf,
Dana Stahl,
Martin Lablans,
Florens Rohde,
Martin Franke,
Klaus Kraywinkel,
Joachim Kieschke,
Sebastian Bartholomäus,
Anatol-Fiete Näher,
Galina Tremper,
Mohamed Lambarki,
Stefanie March,
Fabian Prasser,
Anna Christine Haber,
Johannes Drepper,
Irene Schlünder,
Toralf Kirsten
, et al. (5 additional authors not shown)
Abstract:
Record linkage means linking data from multiple sources. This approach enables the answering of scientific questions that cannot be addressed using single data sources due to limited variables. The potential of linked data for health research is enormous, as it can enhance prevention, treatment, and population health policies. Due the sensitivity of health data, there are strict legal requirements…
▽ More
Record linkage means linking data from multiple sources. This approach enables the answering of scientific questions that cannot be addressed using single data sources due to limited variables. The potential of linked data for health research is enormous, as it can enhance prevention, treatment, and population health policies. Due the sensitivity of health data, there are strict legal requirements to prevent potential misuse. However, these requirements also limit the use of health data for research, thereby hindering innovations in prevention and care. Also, comprehensive Record linkage in Germany is often challenging due to lacking unique personal identifiers or interoperable solutions. Rather, the need to protect data is often weighed against the importance of research aiming at healthcare enhancements: for instance, data protection officers may demand the informed consent of individual study participants for data linkage, even when this is not mandatory. Furthermore, legal frameworks may be interpreted differently on varying occasions. Given both, technical and legal challenges, record linkage for health research in Germany falls behind the standards of other European countries. To ensure successful record linkage, case-specific solutions must be developed, tested, and modified as necessary before implementation. This paper discusses limitations and possibilities of various data linkage approaches tailored to different use cases in compliance with the European General Data Protection Regulation. It further describes requirements for achieving a more research-friendly approach to linking health data records in Germany. Additionally, it provides recommendations to legislators. The objective of this work is to improve record linkage for health research in Germany.
△ Less
Submitted 14 December, 2023;
originally announced December 2023.
Synthetic data generation for a longitudinal cohort study -- Evaluation, method extension and reproduction of published data analysis results
Authors:
Lisa Kühnel,
Julian Schneider,
Ines Perrar,
Tim Adams,
Fabian Prasser,
Ute Nöthlings,
Holger Fröhlich,
Juliane Fluck
Abstract:
Access to individual-level health data is essential for gaining new insights and advancing science. In particular, modern methods based on artificial intelligence rely on the availability of and access to large datasets. In the health sector, access to individual-level data is often challenging due to privacy concerns. A promising alternative is the generation of fully synthetic data, i.e. data ge…
▽ More
Access to individual-level health data is essential for gaining new insights and advancing science. In particular, modern methods based on artificial intelligence rely on the availability of and access to large datasets. In the health sector, access to individual-level data is often challenging due to privacy concerns. A promising alternative is the generation of fully synthetic data, i.e. data generated through a randomised process that have similar statistical properties as the original data, but do not have a one-to-one correspondence with the original individual-level records. In this study, we use a state-of-the-art synthetic data generation method and perform in-depth quality analyses of the generated data for a specific use case in the field of nutrition. We demonstrate the need for careful analyses of synthetic data that go beyond descriptive statistics and provide valuable insights into how to realise the full potential of synthetic datasets. By extending the methods, but also by thoroughly analysing the effects of sampling from a trained model, we are able to largely reproduce significant real-world analysis results in the chosen use case.
△ Less
Submitted 12 May, 2023;
originally announced May 2023.