CirRNAPL: A web server for the identification of circRNA based on extreme learning machine

Comput Struct Biotechnol J. 2020 Apr 2:18:834-842. doi: 10.1016/j.csbj.2020.03.028. eCollection 2020.

Abstract

Circular RNA (circRNA) plays an important role in the development of diseases, and it provides a novel idea for drug development. Accurate identification of circRNAs is important for a deeper understanding of their functions. In this study, we developed a new classifier, CirRNAPL, which extracts the features of nucleic acid composition and structure of the circRNA sequence and optimizes the extreme learning machine based on the particle swarm optimization algorithm. We compared CirRNAPL with existing methods, including blast, on three datasets and found CirRNAPL significantly improved the identification accuracy for the three datasets, with accuracies of 0.815, 0.802, and 0.782, respectively. Additionally, we performed sequence alignment on 564 sequences of the independent detection set of the third data set and analyzed the expression level of circRNAs. Results showed the expression level of the sequence is positively correlated with the abundance. A user-friendly CirRNAPL web server is freely available at http://server.malab.cn/CirRNAPL/.

Keywords: ACC, Accuracy; CNN, Convolutional Neural Networks; Circular RNA; DAC, Dinucleotide-based auto-covariance; DACC, Dinucleotide-based auto-cross-covariance; DCC, Dinucleotide-based cross-covariance; ELM, extreme learning machine; Expression level; Extreme learning machine; GAC, Geary autocorrelation; Identification; MAC, Moran autocorrelation; MCC, Matthews Correlation Coefficient; MRMD, Maximum-Relevance-Maximum-Distance; NMBAC, Normalized Moreau–Broto autocorrelation; PC-PseDNC-General, General parallel correlation pseudo-dinucleotide composition; PCGs, protein coding genes; PSO, particle swarm optimization algorithm; Particle swarm optimization algorithm; PseDPC, Pseudo-distance structure status pair composition; PseSSC, Pseudo-structure status composition; RBF, radial basis function; RF, random forest; SC-PseDNC-General, General series correlation pseudo-dinucleotide composition; SE, Sensitivity; SP, Specifity; SVM, support vector machine; Triplet, Local structure-sequence triplet element; circRNA, circular RNA; lncRNAs, long non-coding RNAs.