SVM-T-RFE: a novel gene selection algorithm for identifying metastasis-related genes in colorectal cancer using gene expression profiles

Biochem Biophys Res Commun. 2012 Mar 9;419(2):148-53. doi: 10.1016/j.bbrc.2012.01.087. Epub 2012 Jan 28.

Abstract

Although metastasis is the principal cause of death cause for colorectal cancer (CRC) patients, the molecular mechanisms underlying CRC metastasis are still not fully understood. In an attempt to identify metastasis-related genes in CRC, we obtained gene expression profiles of 55 early stage primary CRCs, 56 late stage primary CRCs, and 34 metastatic CRCs from the expression project in Oncology (http://www.intgen.org/expo/). We developed a novel gene selection algorithm (SVM-T-RFE), which extends support vector machine recursive feature elimination (SVM-RFE) algorithm by incorporating T-statistic. We achieved highest classification accuracy (100%) with smaller gene subsets (10 and 6, respectively), when classifying between early and late stage primary CRCs, as well as between metastatic CRCs and late stage primary CRCs. We also compared the performance of SVM-T-RFE and SVM-RFE gene selection algorithms on another large-scale CRC dataset and the five public microarray datasets. SVM-T-RFE bestowed SVM-RFE algorithm in identifying more differentially expressed genes, and achieving highest prediction accuracy using equal or smaller number of selected genes. A fraction of selected genes have been reported to be associated with CRC development or metastasis.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms
  • Colorectal Neoplasms / genetics*
  • Colorectal Neoplasms / pathology*
  • Gene Expression Profiling / methods*
  • Gene Expression Regulation, Neoplastic*
  • Genes, Neoplasm*
  • Humans
  • Neoplasm Metastasis
  • Oligonucleotide Array Sequence Analysis / methods*
  • RNA, Messenger / genetics

Substances

  • RNA, Messenger