Cross-site Validation of AI Segmentation and Harmonization in Breast MRI

Yu Huang; Nicholas J Leotta; Lukas Hirsch; Roberto Lo Gullo; Mary Hughes; Jeffrey Reiner; Nicole B Saphier; Kelly S Myers; Babita Panigrahi; Emily Ambinder; Philip Di Carlo; Lars J Grimm; Dorothy Lowell; Sora Yoon; Sujata V Ghate; Lucas C Parra; Elizabeth J Sutton

doi:10.1007/s10278-024-01266-9

Cross-site Validation of AI Segmentation and Harmonization in Breast MRI

J Imaging Inform Med. 2024 Sep 25. doi: 10.1007/s10278-024-01266-9. Online ahead of print.

Authors

Yu Huang^#^{1

2}, Nicholas J Leotta^#¹, Lukas Hirsch¹, Roberto Lo Gullo², Mary Hughes², Jeffrey Reiner², Nicole B Saphier², Kelly S Myers³, Babita Panigrahi³, Emily Ambinder³, Philip Di Carlo³, Lars J Grimm⁴, Dorothy Lowell⁴, Sora Yoon⁴, Sujata V Ghate⁴, Lucas C Parra⁵, Elizabeth J Sutton²

Affiliations

¹ Department of Biomedical Engineering, The City College of the City University of New York, 160 Convent Ave, New York, NY, 10031, USA.
² Department of Radiology, Memorial Sloan Kettering Cancer Center, New York, NY, 10065, USA.
³ Department of Radiology and Radiological Science, Johns Hopkins Medicine, Baltimore, MD, 21224, USA.
⁴ Department of Radiology, Duke University School of Medicine, Durham, NC, 27710, USA.
⁵ Department of Biomedical Engineering, The City College of the City University of New York, 160 Convent Ave, New York, NY, 10031, USA. [email protected].

^# Contributed equally.

PMID: 39320547
DOI: 10.1007/s10278-024-01266-9

Abstract

This work aims to perform a cross-site validation of automated segmentation for breast cancers in MRI and to compare the performance to radiologists. A three-dimensional (3D) U-Net was trained to segment cancers in dynamic contrast-enhanced axial MRIs using a large dataset from Site 1 (n = 15,266; 449 malignant and 14,817 benign). Performance was validated on site-specific test data from this and two additional sites, and common publicly available testing data. Four radiologists from each of the three clinical sites provided two-dimensional (2D) segmentations as ground truth. Segmentation performance did not differ between the network and radiologists on the test data from Sites 1 and 2 or the common public data (median Dice score Site 1, network 0.86 vs. radiologist 0.85, n = 114; Site 2, 0.91 vs. 0.91, n = 50; common: 0.93 vs. 0.90). For Site 3, an affine input layer was fine-tuned using segmentation labels, resulting in comparable performance between the network and radiologist (0.88 vs. 0.89, n = 42). Radiologist performance differed on the common test data, and the network numerically outperformed 11 of the 12 radiologists (median Dice: 0.85-0.94, n = 20). In conclusion, a deep network with a novel supervised harmonization technique matches radiologists' performance in MRI tumor segmentation across clinical sites. We make code and weights publicly available to promote reproducible AI in radiology.

Keywords: Breast cancer segmentation; Cross-site evaluation; Deep learning; Dynamic contrast enhancement; Harmonization; MRI.

Abstract

Grants and funding