Background: Studies on deep learning (DL)-based models in breast ultrasound (US) remain at the early stage due to a lack of large datasets for training and independent test sets for verification. We aimed to develop a DL model for differentiating benign from malignant breast lesions on US using a large multicenter dataset and explore the model's ability to assist the radiologists.
Methods: A total of 14,043 US images from 5012 women were prospectively collected from 32 hospitals. To develop the DL model, the patients from 30 hospitals were randomly divided into a training cohort (n = 4149) and an internal test cohort (n = 466). The remaining 2 hospitals (n = 397) were used as the external test cohorts (ETC). We compared the model with the prospective Breast Imaging Reporting and Data System assessment and five radiologists. We also explored the model's ability to assist the radiologists using two different methods.
Results: The model demonstrated excellent diagnostic performance with the ETC, with a high area under the receiver operating characteristic curve (AUC, 0.913), sensitivity (88.84%), specificity (83.77%), and accuracy (86.40%). In the comparison set, the AUC was similar to that of the expert (p = 0.5629) and one experienced radiologist (p = 0.2112) and significantly higher than that of three inexperienced radiologists (p < 0.01). After model assistance, the accuracies and specificities of the radiologists were substantially improved without loss in sensitivities.
Conclusions: The DL model yielded satisfactory predictions in distinguishing benign from malignant breast lesions. The model showed the potential value in improving the diagnosis of breast lesions by radiologists.
Keywords: Artificial intelligence; Breast neoplasms; Deep learning; Diagnosis; Ultrasonography.
© 2022. The Author(s).