Prediction-oriented prognostic biomarker discovery with survival machine learning methods

NAR Genom Bioinform. 2023 Jun 16;5(2):lqad055. doi: 10.1093/nargab/lqad055. eCollection 2023 Jun.

Abstract

Identifying novel and reliable prognostic biomarkers for predicting patient survival outcomes is essential for deciding personalized treatment strategies for diseases such as cancer. Numerous feature selection techniques have been proposed to address the high-dimensional problem in constructing prediction models. Not only does feature selection lower the data dimension, but it also improves the prediction accuracy of the resulted models by mitigating overfitting. The performances of these feature selection methods when applied to survival models, on the other hand, deserve further investigation. In this paper, we construct and compare a series of prediction-oriented biomarker selection frameworks by leveraging recent machine learning algorithms, including random survival forests, extreme gradient boosting, light gradient boosting and deep learning-based survival models. Additionally, we adapt the recently proposed prediction-oriented marker selection (PROMISE) to a survival model (PROMISE-Cox) as a benchmark approach. Our simulation studies indicate that boosting-based approaches tend to provide superior accuracy with better true positive rate and false positive rate in more complicated scenarios. For demonstration purpose, we applied the proposed biomarker selection strategies to identify prognostic biomarkers in different modalities of head and neck cancer data.