Privacy enhancing and generalizable deep learning with synthetic data for mediastinal neoplasm diagnosis

Zhanping Zhou; Yuchen Guo; Ruijie Tang; Hengrui Liang; Jianxing He; Feng Xu

doi:10.1038/s41746-024-01290-7

Privacy enhancing and generalizable deep learning with synthetic data for mediastinal neoplasm diagnosis

NPJ Digit Med. 2024 Oct 20;7(1):293. doi: 10.1038/s41746-024-01290-7.

Authors

Zhanping Zhou^{1

2}, Yuchen Guo³, Ruijie Tang^{1

2}, Hengrui Liang⁴, Jianxing He⁴, Feng Xu^{5

6}

Affiliations

¹ School of Software, Tsinghua University, Beijing, China.
² Beijing National Research Center for Information Science and Technology (BNRist), Tsinghua University, Beijing, China.
³ Beijing National Research Center for Information Science and Technology (BNRist), Tsinghua University, Beijing, China. [email protected].
⁴ Department of Thoracic Oncology and Surgery, China State Key Laboratory of Respiratory Disease & National Clinical Research Center for Respiratory Disease, the First Affiliated Hospital of Guangzhou Medical University, Guangzhou, China.
⁵ School of Software, Tsinghua University, Beijing, China. [email protected].
⁶ Beijing National Research Center for Information Science and Technology (BNRist), Tsinghua University, Beijing, China. [email protected].

PMID: 39427092
DOI: 10.1038/s41746-024-01290-7

Abstract

The success of deep learning (DL) relies heavily on training data from which DL models encapsulate information. Consequently, the development and deployment of DL models expose data to potential privacy breaches, which are particularly critical in data-sensitive contexts like medicine. We propose a new technique named DiffGuard that generates realistic and diverse synthetic medical images with annotations, even indistinguishable for experts, to replace real data for DL model training, which cuts off their direct connection and enhances privacy safety. We demonstrate that DiffGuard enhances privacy safety with much less data leakage and better resistance against privacy attacks on data and model. It also improves the accuracy and generalizability of DL models for segmentation and classification of mediastinal neoplasms in multi-center evaluation. We expect that our solution would enlighten the road to privacy-preserving DL for precision medicine, promote data and model sharing, and inspire more innovation on artificial-intelligence-generated-content technologies for medicine.

Abstract

Grants and funding