Using Machine Learning Methods to Study Colorectal Cancer Tumor Micro-Environment and Its Biomarkers

Int J Mol Sci. 2023 Jul 6;24(13):11133. doi: 10.3390/ijms241311133.

Abstract

Colorectal cancer (CRC) is a leading cause of cancer deaths worldwide, and the identification of biomarkers can improve early detection and personalized treatment. In this study, RNA-seq data and gene chip data from TCGA and GEO were used to explore potential biomarkers for CRC. The SMOTE method was used to address class imbalance, and four feature selection algorithms (MCFS, Borota, mRMR, and LightGBM) were used to select genes from the gene expression matrix. Four machine learning algorithms (SVM, XGBoost, RF, and kNN) were then employed to obtain the optimal number of genes for model construction. Through interpretable machine learning (IML), co-predictive networks were generated to identify rules and uncover underlying relationships among the selected genes. Survival analysis revealed that INHBA, FNBP1, PDE9A, HIST1H2BG, and CADM3 were significantly correlated with prognosis in CRC patients. In addition, the CIBERSORT algorithm was used to investigate the proportion of immune cells in CRC tissues, and gene mutation rates for the five selected biomarkers were explored. The biomarkers identified in this study have significant implications for the development of personalized therapies and could ultimately lead to improved clinical outcomes for CRC patients.

Keywords: biomarker; colorectal cancer; feature selection; interpretable machine learning; machine learning.

MeSH terms

  • 3',5'-Cyclic-AMP Phosphodiesterases
  • Algorithms
  • Biomarkers
  • Biomarkers, Tumor / genetics
  • Colorectal Neoplasms* / diagnosis
  • Colorectal Neoplasms* / genetics
  • Genes, Regulator
  • Humans
  • Machine Learning
  • Transcription Factors
  • Tumor Microenvironment / genetics

Substances

  • Transcription Factors
  • Biomarkers
  • Biomarkers, Tumor
  • PDE9A protein, human
  • 3',5'-Cyclic-AMP Phosphodiesterases