Preserving privacy in healthcare: A systematic review of deep learning approaches for synthetic data generation

Comput Methods Programs Biomed. 2024 Dec 28:260:108571. doi: 10.1016/j.cmpb.2024.108571. Online ahead of print.

Abstract

Background: Data sharing in healthcare is vital for advancing research and personalized medicine. However, the process is hindered by privacy, ethical, and legal challenges associated with patient data. Synthetic data generation emerges as a promising solution, replicating statistical properties of real data while enhancing privacy protection.

Methods: This systematic review examines deep learning techniques for synthetic data generation in healthcare, focusing on their ability to maintain data utility and enhance privacy. Studies from Scopus, Web of Science, PubMed, and IEEE databases published between 2019 and 2023 were analyzed. Key methods explored include Generative Adversarial Networks (GANs), Variational Autoencoders (VAEs), and Diffusion Models. Evaluation metrics encompass data resemblance, utility, and privacy preservation, with special attention to privacy-enhancing methods like differential privacy and federated learning.

Results: GANs and VAEs demonstrated robust capabilities in generating realistic synthetic data for tabular, signal, image, and multi-modal datasets. Privacy-preserving approaches such as differential privacy and adversarial training significantly reduced re-identification risks while maintaining data fidelity. However, challenges persist in preserving temporal correlations, reducing biases, and aligning with regulatory frameworks, particularly for longitudinal and high-dimensional data.

Conclusion: Synthetic data generation holds significant potential for privacy-preserving data sharing in healthcare. Ongoing research is required to develop advanced algorithms and evaluation frameworks, ensuring synthetic data's quality and privacy. Collaboration between technologists and policymakers is essential to create comprehensive guidelines, fostering secure and effective data sharing in healthcare.

Keywords: Deep learning; Generative Adversarial Networks (GANs); Healthcare data sharing; Privacy preservation; Synthetic data generation.

Publication types

  • Review