Performance Comparison of 10 State-of-the-Art Machine Learning Algorithms for Outcome Prediction Modeling of Radiation-Induced Toxicity

Ramon M Salazar; Saurabh S Nair; Alexandra O Leone; Ting Xu; Raymond P Mumme; Jack D Duryea; Brian De; Kelsey L Corrigan; Michael K Rooney; Matthew S Ning; Prajnan Das; Emma B Holliday; Zhongxing Liao; Laurence E Court; Joshua S Niedzielski

doi:10.1016/j.adro.2024.101675

Performance Comparison of 10 State-of-the-Art Machine Learning Algorithms for Outcome Prediction Modeling of Radiation-Induced Toxicity

Adv Radiat Oncol. 2024 Nov 13;10(2):101675. doi: 10.1016/j.adro.2024.101675. eCollection 2025 Feb.

Authors

Affiliations

¹ Departments of Radiation Physics.
² Radiation Oncology, The University of Texas MD Anderson Cancer Center, Houston, Texas.

Abstract

Purpose: To evaluate the efficacy of prominent machine learning algorithms in predicting normal tissue complication probability using clinical data obtained from 2 distinct disease sites and to create a software tool that facilitates the automatic determination of the optimal algorithm to model any given labeled data set.

Methods and materials: We obtained 3 sets of radiation toxicity data (478 patients) from our clinic: gastrointestinal toxicity, radiation pneumonitis, and radiation esophagitis. These data comprised clinicopathological and dosimetric information for patients diagnosed with non-small cell lung cancer and anal squamous cell carcinoma. Each data set was modeled using 11 commonly employed machine learning algorithms (elastic net, least absolute shrinkage and selection operator [LASSO], random forest, random forest regression, support vector machine, extreme gradient boosting, light gradient boosting machine, k-nearest neighbors, neural network, Bayesian-LASSO, and Bayesian neural network) by randomly dividing the data set into a training and test set. The training set was used to create and tune the model, and the test set served to assess it by calculating performance metrics. This process was repeated 100 times by each algorithm for each data set. Figures were generated to visually compare the performance of the algorithms. A graphical user interface was developed to automate this whole process.

Results: LASSO achieved the highest area under the precision-recall curve (0.807 ± 0.067) for radiation esophagitis, random forest for gastrointestinal toxicity (0.726 ± 0.096), and the neural network for radiation pneumonitis (0.878 ± 0.060). The area under the curve was 0.754 ± 0.069, 0.889 ± 0.043, and 0.905 ± 0.045, respectively. The graphical user interface was used to compare all algorithms for each data set automatically. When averaging the area under the precision-recall curve across all toxicities, Bayesian-LASSO was the best model.

Conclusions: Our results show that there is no best algorithm for all data sets. Therefore, it is important to compare multiple algorithms when training an outcome prediction model on a new data set. The graphical user interface created for this study automatically compares the performance of these 11 algorithms for any data set.