Machine Learning Reduced Gene/Non-Coding RNA Features That Classify Schizophrenia Patients Accurately and Highlight Insightful Gene Clusters

Int J Mol Sci. 2021 Mar 25;22(7):3364. doi: 10.3390/ijms22073364.

Abstract

RNA-seq has been a powerful method to detect the differentially expressed genes/long non-coding RNAs (lncRNAs) in schizophrenia (SCZ) patients; however, due to overfitting problems differentially expressed targets (DETs) cannot be used properly as biomarkers. This study used machine learning to reduce gene/non-coding RNA features. Dorsolateral prefrontal cortex (dlpfc) RNA-seq data from 254 individuals was obtained from the CommonMind consortium. The average predictive accuracy for SCZ patients was 67% based on coding genes, and 96% based on long non-coding RNAs (lncRNAs). Machine learning is a powerful algorithm to reduce functional biomarkers in SCZ patients. The lncRNAs capture the characteristics of SCZ tissue more accurately than mRNA as the former regulate every level of gene expression, not limited to mRNA levels.

Keywords: long non-coding RNAs; machine learning; schizophrenia; transcriptome.

MeSH terms

  • Algorithms
  • Biomarkers / metabolism
  • Computational Biology / methods
  • Diagnosis, Computer-Assisted
  • Factor Analysis, Statistical
  • Humans
  • Machine Learning*
  • Multigene Family*
  • Prefrontal Cortex / metabolism*
  • RNA, Long Noncoding / genetics
  • RNA, Messenger / genetics
  • RNA, Untranslated / genetics*
  • RNA-Seq
  • Schizophrenia / diagnosis*
  • Schizophrenia / genetics*
  • Transcriptome

Substances

  • Biomarkers
  • RNA, Long Noncoding
  • RNA, Messenger
  • RNA, Untranslated