Background: Genetic factors account for approximately 35% of colorectal cancer risk. The specificity and sensitivity of previous diagnostic biomarkers for colorectal cancer could not meet the need of clinical application. The expanding scale and inherent complexity of biological data have encouraged a growing use of machine learning to build informative and predictive models of the underlying biological processes. The aim of this study is to identify diagnostic genes of colorectal cancer by using machine learning methods.
Methods: The GSE41328 and GSE106582 data sets were downloaded from the Gene Expression Omnibus (GEO) database. The gene expression differences between colon cancer and normal tissues were analyzed. The key colorectal cancer genes were screened and validated by Least Absolute Shrinkage and Selection Operator (LASSO) and Support Vector Machine (SVM) regression. Immune cell infiltration and the correlation with the key genes in patients with colon cancer were further analyzed by CIBERSORT.
Results: Eleven key genes were identified as biomarkers for colon cancer, namely ASCL2, BEST4, CFD, DPEPCFD, FOXQ1, TRIB3, KLF4, MMP7, MMP11, PYY, and PDK4. The mean area under the receiver operating characteristic (ROC) curve (AUC) of all 11 genes for colon cancer diagnosis were 0.94 with a range of 0.91-0.97. In the validation set, the expression of the 11 key genes was significantly different between colon cancer and normal subjects (P<0.05) and the mean AUCs were 0.82 with a range of 0.70-0.88. Immune cell infiltration analyses demonstrated that the relative quantity of plasma cells, T cells, B cells, NK cells, MO, M1, Dendritic cells resting, Mast cells resting, Mast cells activated, and Neutrophils in the tumor group were significantly different to the normal group.
Conclusions: ASCL2, BEST4, CFD, DPEPCFD, FOXQ1, TRIB3, KLF4, MMP7, MMP11, PYY, and PDK4 were identified as the key genes for colon cancer diagnosis. These genes are expected to become novel diagnostic markers and targets of new pharmacotherapies for colorectal cancer.
Keywords: Diagnostic genes; colorectal neoplasms; immune infiltration; machine learning.
2022 Journal of Gastrointestinal Oncology. All rights reserved.