Glioma stages prediction based on machine learning algorithm combined with protein-protein interaction networks

Genomics. 2020 Jan;112(1):837-847. doi: 10.1016/j.ygeno.2019.05.024. Epub 2019 May 29.

Abstract

Background: Glioma is the most lethal nervous system cancer. Recent studies have made great efforts to study the occurrence and development of glioma, but the molecular mechanisms are still unclear. This study was designed to reveal the molecular mechanisms of glioma based on protein-protein interaction network combined with machine learning methods. Key differentially expressed genes (DEGs) were screened and selected by using the protein-protein interaction (PPI) networks.

Results: As a result, 19 genes between grade I and grade II, 21 genes between grade II and grade III, and 20 genes between grade III and grade IV. Then, five machine learning methods were employed to predict the gliomas stages based on the selected key genes. After comparison, Complement Naive Bayes classifier was employed to build the prediction model for grade II-III with accuracy 72.8%. And Random forest was employed to build the prediction model for grade I-II and grade III-VI with accuracy 97.1% and 83.2%, respectively. Finally, the selected genes were analyzed by PPI networks, Gene Ontology (GO) terms and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathways, and the results improve our understanding of the biological functions of select DEGs involved in glioma growth. We expect that the key genes expressed have a guiding significance for the occurrence of gliomas or, at the very least, that they are useful for tumor researchers.

Conclusion: Machine learning combined with PPI networks, GO and KEGG analyses of selected DEGs improve our understanding of the biological functions involved in glioma growth.

Keywords: ANN; Couple naïve Bayes; DEGs; GO; KEGG; Machine learning; PPI networks; Random forest; SVM.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Brain Neoplasms / diagnosis
  • Brain Neoplasms / genetics*
  • Brain Neoplasms / metabolism*
  • Gene Expression
  • Gene Ontology
  • Glioma / diagnosis
  • Glioma / genetics*
  • Glioma / metabolism*
  • Machine Learning*
  • Neoplasm Staging
  • Protein Interaction Mapping*