RNA-seq has been a powerful method to detect the differentially expressed genes/long non-coding RNAs (lncRNAs) in schizophrenia (SCZ) patients; however, due to overfitting problems differentially expressed targets (DETs) cannot be used properly as biomarkers. This study used machine learning to reduce gene/non-coding RNA features. Dorsolateral prefrontal cortex (dlpfc) RNA-seq data from 254 individuals was obtained from the CommonMind consortium. The average predictive accuracy for SCZ patients was 67% based on coding genes, and 96% based on long non-coding RNAs (lncRNAs). Machine learning is a powerful algorithm to reduce functional biomarkers in SCZ patients. The lncRNAs capture the characteristics of SCZ tissue more accurately than mRNA as the former regulate every level of gene expression, not limited to mRNA levels.
Keywords: long non-coding RNAs; machine learning; schizophrenia; transcriptome.