Objectives: We aimed to assess the performance of radiomics and machine learning (ML) for classification of non-cystic benign and malignant breast lesions on ultrasound images, compare ML's accuracy with that of a breast radiologist, and verify if the radiologist's performance is improved by using ML.
Methods: Our retrospective study included patients from two institutions. A total of 135 lesions from Institution 1 were used to train and test the ML model with cross-validation. Radiomic features were extracted from manually annotated images and underwent a multistep feature selection process. Not reproducible, low variance, and highly intercorrelated features were removed from the dataset. Then, 66 lesions from Institution 2 were used as an external test set for ML and to assess the performance of a radiologist without and with the aid of ML, using McNemar's test.
Results: After feature selection, 10 of the 520 features extracted were employed to train a random forest algorithm. Its accuracy in the training set was 82% (standard deviation, SD, ± 6%), with an AUC of 0.90 (SD ± 0.06), while the performance on the test set was 82% (95% confidence intervals (CI) = 70-90%) with an AUC of 0.82 (95% CI = 0.70-0.93). It resulted in being significantly better than the baseline reference (p = 0.0098), but not different from the radiologist (79.4%, p = 0.815). The radiologist's performance improved when using ML (80.2%), but not significantly (p = 0.508).
Conclusions: A radiomic analysis combined with ML showed promising results to differentiate benign from malignant breast lesions on ultrasound images.
Key points: • Machine learning showed good accuracy in discriminating benign from malignant breast lesions • The machine learning classifier's performance was comparable to that of a breast radiologist • The radiologist's accuracy improved with machine learning, but not significantly.
Keywords: Breast cancer; Machine learning; Ultrasound.
© 2021. The Author(s).