Purpose: We study the inter-reader variability in manual delineation of cystic renal masses (CRMs) presented in computerized tomography (CT) images and its effect on the classification performance of a machine learning algorithm in distinguishing benign from potentially malignant CRMs. In addition, we assessed whether the inclusion of higher-order robust radiomic features improves the classification performance over the use of first-order features.
Methods: 230 CRMs were independently delineated by two radiologists. Through a combination of random fluctuations, dilation, and erosion operations over the original region of interests (ROIs), we generated four additional sets of synthetic ROIs to capture the inter-reader variability realistically, as confirmed by dice coefficient measurements and visual assessment. We then identified the robust features based on the intra-class coefficient (ICC > 0.85) across these datasets. We applied a tenfold stratified cross-validation (CV) to train and test the performance of the random forest model for the classification of CRMs into benign and potentially malignant.
Results: The mean area under the curve (AUC), sensitivity, specificity, positive predictive value, and negative predictive value were 0.87, 0.82, 0.90, 0.85, and 0.93, respectively. With the usage of first-order features alone, the corresponding values were nearly identical.
Conclusion: AUC ranged for the robust and uncorrelated features from 0.83 ± 0.09 to 0.93 ± 0.04 and for the first-order features from 0.84 ± 0.09 to 0.91 ± 0.04. Our study indicates that the first-order features alone are sufficient for the classification of CRMs, and that inclusion of higher-order features does not necessarily improve performance.
Keywords: Cystic renal mass; Inter-reader variability; Machine learning; Radiomics; Robust features.
© 2021. The Author(s), under exclusive licence to Springer Science+Business Media, LLC, part of Springer Nature.