Background and objective: Single-source domain generalization (SSDG) aims to generalize a deep learning (DL) model trained on one source dataset to multiple unseen datasets. This is important for the clinical applications of DL-based models to breast cancer screening, wherein a DL-based model is commonly developed in an institute and then tested in other institutes. One challenge of SSDG is to alleviate the domain shifts using only one domain dataset.
Methods: The present study proposed a domain-invariant features learning framework (DIFLF) for single-source domain. Specifically, a style-augmentation module (SAM) and a content-style disentanglement module (CSDM) are proposed in DIFLF. SAM includes two different color jitter transforms, which transforms each mammogram in the source domain into two synthesized mammograms with new styles. Thus, it can greatly increase the feature diversity of the source domain, reducing the overfitting of the trained model. CSDM includes three feature disentanglement units, which extracts domain-invariant content (DIC) features by disentangling them from domain-specific style (DSS) features, reducing the influence of the domain shifts resulting from different feature distributions. Our code is available for open access on Github (https://github.com/85675/DIFLF).
Results: DIFLF is trained in a private dataset (PRI1), and tested first in another private dataset (PRI2) with similar feature distribution to PRI1 and then tested in two public datasets (INbreast and MIAS) with greatly different feature distributions from PRI1. As revealed by the experiment results, DIFLF presents excellent performance for classifying mammograms in the unseen target datasets of PRI2, INbreast, and MIAS. The accuracy and AUC of DIFLF are 0.917 and 0.928 in PRI2, 0.882 and 0.893 in INbreast, 0.767 and 0.710 in MIAS, respectively.
Conclusions: DIFLF can alleviate the influence of domain shifts only using one source dataset. Moreover, DIFLF can achieve an excellent mammogram classification performance even in the unseen datasets with great feature distribution differences from the training dataset.
Keywords: Breast cancer; Content-style disentanglement module; Deep learning; Domain generalization; Mammogram, Style-augmentation module.
Copyright © 2025 Elsevier B.V. All rights reserved.