A comparison of machine learning methods for predicting recurrence and death after curative-intent radiotherapy for non-small cell lung cancer: Development and validation of multivariable clinical prediction models

EBioMedicine. 2022 Mar:77:103911. doi: 10.1016/j.ebiom.2022.103911. Epub 2022 Mar 3.

Abstract

Background: Surveillance is universally recommended for non-small cell lung cancer (NSCLC) patients treated with curative-intent radiotherapy. High-quality evidence to inform optimal surveillance strategies is lacking. Machine learning demonstrates promise in accurate outcome prediction for a variety of health conditions. The purpose of this study was to utilise readily available patient, tumour, and treatment data to develop, validate and externally test machine learning models for predicting recurrence, recurrence-free survival (RFS) and overall survival (OS) at 2 years from treatment.

Methods: A retrospective, multicentre study of patients receiving curative-intent radiotherapy for NSCLC was undertaken. A total of 657 patients from 5 hospitals were eligible for inclusion. Data pre-processing derived 34 features for predictive modelling. Combinations of 8 feature reduction methods and 10 machine learning classification algorithms were compared, producing risk-stratification models for predicting recurrence, RFS and OS. Models were compared with 10-fold cross validation and an external test set and benchmarked against TNM-stage and performance status. Youden Index was derived from validation set ROC curves to distinguish high and low risk groups and Kaplan-Meier analyses performed.

Findings: Median follow-up time was 852 days. Parameters were well matched across training-validation and external test sets: Mean age was 73 and 71 respectively, and recurrence, RFS and OS rates at 2 years were 43% vs 34%, 54% vs 47% and 54% vs 47% respectively. The respective validation and test set AUCs were as follows: 1) RFS: 0·682 (0·575-0·788) and 0·681 (0·597-0·766), 2) Recurrence: 0·687 (0·582-0·793) and 0·722 (0·635-0·81), and 3) OS: 0·759 (0·663-0·855) and 0·717 (0·634-0·8). Our models were superior to TNM stage and performance status in predicting recurrence and OS.

Interpretation: This robust and ready to use machine learning method, validated and externally tested, sets the stage for future clinical trials entailing quantitative personalised risk-stratification and surveillance following curative-intent radiotherapy for NSCLC.

Funding: A full list of funding bodies that contributed to this study can be found in the Acknowledgements section.

Keywords: Early detection; Machine learning; Non-small cell lung cancer; Overall survival; Prediction; Radiotherapy; Recurrence.

Publication types

  • Multicenter Study

MeSH terms

  • Carcinoma, Non-Small-Cell Lung* / diagnosis
  • Carcinoma, Non-Small-Cell Lung* / drug therapy
  • Carcinoma, Non-Small-Cell Lung* / radiotherapy
  • Humans
  • Lung Neoplasms* / diagnosis
  • Lung Neoplasms* / drug therapy
  • Lung Neoplasms* / radiotherapy
  • Machine Learning
  • Models, Statistical
  • Neoplasm Staging
  • Prognosis
  • Retrospective Studies