Construction of Prognostic Prediction Models for Colorectal Cancer Based on Ferroptosis-Related Genes: A Multi-Dataset and Multi-Model Analysis

Biomed Eng Comput Biol. 2024 Nov 2:15:11795972241293516. doi: 10.1177/11795972241293516. eCollection 2024.

Abstract

Background: Colorectal cancer (CRC) remains a significant health burden globally, necessitating a deeper understanding of its molecular landscape and prognostic markers. This study characterized ferroptosis-related genes (FRGs) to construct models for predicting overall survival (OS) across various CRC datasets.

Methods: In TCGA-COAD dataset, differentially expressed genes (DEGs) were identified between tumor and normal tissues using DESeq2 package. Prognostic genes were identified associated with OS, disease-specific survival, and progression-free interval using survival package. Additionally, FRGs were downloaded from FerrDb website, categorized into unclassified, marker, and driver genes. Finally, multiple models (Coxboost, Elastic Net, Gradient Boosting Machine, LASSO Regression, Partial Least Squares Regression for Cox Regression, Ridge Regression, Random Survival Forest [RSF], stepwise Cox Regression, Supervised Principal Components analysis, and Support Vector Machines) were employed to predict OS across multiple datasets (TCGA-COAD, GSE103479, GSE106584, GSE17536, GSE17537, GSE29621, GSE39084, GSE39582, and GSE72970) using intersection genes across DEGs, OS, disease-specific survival, and progression-free interval, and FRG categories.

Results: Six intersection genes (ASNS, TIMP1, H19, CDKN2A, HOTAIR, and ASMTL-AS1) were identified, upregulated in tumor tissues, and associated with poor survival outcomes. In the TCGA-COAD dataset, the RSF model demonstrated the highest concordance index. Kaplan-Meier analysis revealed significantly lower OS probabilities in high-risk groups identified by the RSF model. The RSF model exhibited high accuracy with AUC values of 0.978, 0.985, and 0.965 for 1-, 3-, and 5-year survival predictions, respectively. Calibration curves demonstrated excellent agreement between predicted and observed survival probabilities. Decision curve analysis confirmed the clinical utility of the RSF model. Additionally, the model's performances were validated in GSE29621 dataset.

Conclusions: The study underscores the prognostic relevance of 6 intersection genes in CRC, providing insights into potential therapeutic targets and biomarkers for patient stratification. The RSF model demonstrates robust predictive performance, suggesting its utility in clinical risk assessment and personalized treatment strategies.

Keywords: Colorectal cancer; GEO; TCGA; ferroptosis-related genes; survival prediction models.