The Challenge of Choosing the Best Classification Method in Radiomic Analyses: Recommendations and Applications to Lung Cancer CT Images

Federica Corso; Giulia Tini; Giuliana Lo Presti; Noemi Garau; Simone Pietro De Angelis; Federica Bellerba; Lisa Rinaldi; Francesca Botta; Stefania Rizzo; Daniela Origgi; Chiara Paganelli; Marta Cremonesi; Cristiano Rampinelli; Massimo Bellomi; Luca Mazzarella; Pier Giuseppe Pelicci; Sara Gandini; Sara Raimondi

doi:10.3390/cancers13123088

The Challenge of Choosing the Best Classification Method in Radiomic Analyses: Recommendations and Applications to Lung Cancer CT Images

Cancers (Basel). 2021 Jun 21;13(12):3088. doi: 10.3390/cancers13123088.

Authors

Federica Corso^{1

2

3}, Giulia Tini¹, Giuliana Lo Presti⁴, Noemi Garau^{5

6}, Simone Pietro De Angelis⁷, Federica Bellerba⁷, Lisa Rinaldi^{8

9}, Francesca Botta⁴, Stefania Rizzo¹⁰, Daniela Origgi⁴, Chiara Paganelli⁵, Marta Cremonesi⁸, Cristiano Rampinelli⁶, Massimo Bellomi⁶, Luca Mazzarella^{1

11}, Pier Giuseppe Pelicci^{1

12}, Sara Gandini⁷, Sara Raimondi⁷

Affiliations

¹ Department of Experimental Oncology, IEO European Institute of Oncology IRCCS, via Adamello 16, 20139 Milan, Italy.
² Department of Mathematics (DMAT), Politecnico di Milano, via Edoardo Bonardi 9, 20133 Milan, Italy.
³ Centre for Analysis, Decision and Society (CADS), Human Technopole, via Cristina Belgioioso 171, 20157 Milan, Italy.
⁴ Medical Physics Unit, IEO European Institute of Oncology IRCCS, via Ripamonti 435, 20141 Milan, Italy.
⁵ Department of Electronics, Information and Bioengineering (DEIB), Politecnico di Milano, via Ponzio 34, 20133 Milan, Italy.
⁶ Division of Radiology, IEO European Institute of Oncology IRCCS, via Ripamonti 435, 20141 Milan, Italy.
⁷ Molecular and Pharmaco-Epidemiology Unit, Department of Experimental Oncology, IEO European Institute of Oncology IRCCS, via Adamello 16, 20139 Milan, Italy.
⁸ Radiation Research Unit, IEO European Institute of Oncology IRCCS, via Giuseppe Ripamonti 435, 20141 Milan, Italy.
⁹ Department of Physics, University of Pavia, via Bassi 6, 27100 Pavia, Italy.
¹⁰ Clinica di Radiologia EOC, Istituto Imaging della Svizzera Italiana (IIMSI), via Tesserete 46, 6900 Lugano, Switzerland.
¹¹ Division of Early Drug Development for Innovative Therapies, IEO European Institute of Experimental Oncology IRCCS, via Ripamonti 435, 20141 Milan, Italy.
¹² Department of Oncology and Hematology-Oncology, University of Milan, via Festa del Perdono 7, 20122 Milan, Italy.

Abstract

Radiomics uses high-dimensional sets of imaging features to predict biological characteristics of tumors and clinical outcomes. The choice of the algorithm used to analyze radiomic features and perform predictions has a high impact on the results, thus the identification of adequate machine learning methods for radiomic applications is crucial. In this study we aim to identify suitable approaches of analysis for radiomic-based binary predictions, according to sample size, outcome balancing and the features-outcome association strength. Simulated data were obtained reproducing the correlation structure among 168 radiomic features extracted from Computed Tomography images of 270 Non-Small-Cell Lung Cancer (NSCLC) patients and the associated to lymph node status. Performances of six classifiers combined with six feature selection (FS) methods were assessed on the simulated data using AUC (Area Under the Receiver Operating Characteristics Curves), sensitivity, and specificity. For all the FS methods and regardless of the association strength, the tree-based classifiers Random Forest and Extreme Gradient Boosting obtained good performances (AUC ≥ 0.73), showing the best trade-off between sensitivity and specificity. On small samples, performances were generally lower than in large-medium samples and with larger variations. FS methods generally did not improve performances. Thus, in radiomic studies, we suggest evaluating the choice of FS and classifiers, considering specific sample size, balancing, and association strength.

Keywords: CT images; balancing; classification; feature selection; lung cancer; machine learning; radiomics; sample size; signal; simulation.