Machine learning-based analysis identifies a 13-gene prognostic signature to improve the clinical outcomes of colorectal cancer

J Gastrointest Oncol. 2024 Oct 31;15(5):2100-2116. doi: 10.21037/jgo-24-325. Epub 2024 Oct 24.

Abstract

Background: Colorectal cancer (CRC) is a common intestinal malignancy worldwide, posing a serious threat to public health. Due to its high heterogeneity, prognosis and drug response of different CRC patients vary widely, limiting the effectiveness of traditional treatment. Therefore, this study aims to construct a novel CRC prognostic signature using machine learning algorithms to assist in making informed clinical decisions and improving treatment outcomes.

Methods: Gene expression matrix and clinical information of CRC patients were obtained from the The Cancer Genome Atlas (TCGA) and Gene Expression Omnibus (GEO) databases. Then, genes with prognostic value were identified through univariate Cox regression analysis. Next, nine machine learning algorithms, including least absolute shrinkage and selection operator (LASSO), gradient boosting machine (GBM), CoxBoost, plsRcox, Ridge, Enet, StepCox, SuperPC and survivalSVM were integrated to form 97 combinations, which was employed to screen the best strategy for building a prognostic model based on the average C-index in the three CRC cohorts. Kaplan Meier survival analysis, receiver operating curve (ROC) analysis and multivariate regression analysis were conducted to assess the predictive performance of the constructed signature. Furthermore, the CIBERSORT and ESTIMATE algorithms were utilized to quantify the infiltration level of immune cells. Besides, a nomogram were developed to predict 1-, 2-, and 3-year overall survival (OS) probabilities for individual patient.

Results: A prognostic signature consisting of 13 genes was developed utilizing LASSO Cox regression and GBM methods. Across both the training and validation datasets, the performance evaluation consistently indicated the signature's capacity to accurately predict the prognosis of CRC patients. Especially, compared with 30 published signatures, the 13-gene model exhibited dramatically superior predictive power. Even within clinical subgroups, it could still precisely stratify the prognosis. Functional analysis revealed a robust association between the signature and the immune status as well as chemotherapy response in CRC patients. Furthermore, a nomogram was created based on the signature-derived risk score, which demonstrated a strong predictive ability for OS in CRC patients.

Conclusions: The 13-gene prognostic signature is expected to be a valuable tool for risk stratification, survival prediction, and treatment evaluation of patients with CRC.

Keywords: Colorectal cancer (CRC); machine learning; prognosis; signature; survival.