CNNLSTMac4CPred: A Hybrid Model for N4-Acetylcytidine Prediction

Interdiscip Sci. 2022 Jun;14(2):439-451. doi: 10.1007/s12539-021-00500-0. Epub 2022 Feb 1.

Abstract

N4-Acetylcytidine (ac4C) is a highly conserved post-transcriptional and an extensively existing RNA modification, playing versatile roles in the cellular processes. Due to the limitation of techniques and knowledge, large-scale identification of ac4C is still a challenging task. RNA sequences are like sentences containing semantics in the natural language. Inspired by the semantics of language, we proposed a hybrid model for ac4C prediction. The model used long short-term memory and convolution neural network to extract the semantic features hidden in the sequences. The semantic and the two traditional features (k-nucleotide frequencies and pseudo tri-tuple nucleotide composition) were combined to represent ac4C or non-ac4C sequences. The eXtreme Gradient Boosting was used as the learning algorithm. Five-fold cross-validation over the training set consisting of 1160 ac4C and 10,855 non-ac4C sequences obtained the area under the receiver operating characteristic curve (AUROC) of 0.9004, and the independent test over 469 ac4C and 4343 non-ac4C sequences reached an AUROC of 0.8825. The model obtained a sensitivity of 0.6474 in the five-fold cross-validation and 0.6290 in the independent test, outperforming two state-of-the-art methods. The performance of semantic features alone was better than those of k-nucleotide frequencies and pseudo tri-tuple nucleotide composition, implying that ac4C sequences are of semantics. The proposed hybrid model was implemented into a user-friendly web-server which is freely available to scientific communities: http://47.113.117.61/ac4c/ . The presented model and tool are beneficial to identify ac4C on large scale.

Keywords: Convolution neural network; Deep learning; Long short-term memory; N4-Acetylcytidine; RNA modification; XGBoost.

MeSH terms

  • Algorithms
  • Cytidine* / analogs & derivatives
  • Cytidine* / genetics
  • Nucleotides*
  • ROC Curve

Substances

  • Nucleotides
  • N-acetylcytidine
  • Cytidine