EDLMFC: an ensemble deep learning framework with multi-scale features combination for ncRNA-protein interaction prediction

BMC Bioinformatics. 2021 Mar 19;22(1):133. doi: 10.1186/s12859-021-04069-9.

Abstract

Background: Non-coding RNA (ncRNA) and protein interactions play essential roles in various physiological and pathological processes. The experimental methods used for predicting ncRNA-protein interactions are time-consuming and labor-intensive. Therefore, there is an increasing demand for computational methods to accurately and efficiently predict ncRNA-protein interactions.

Results: In this work, we presented an ensemble deep learning-based method, EDLMFC, to predict ncRNA-protein interactions using the combination of multi-scale features, including primary sequence features, secondary structure sequence features, and tertiary structure features. Conjoint k-mer was used to extract protein/ncRNA sequence features, integrating tertiary structure features, then fed into an ensemble deep learning model, which combined convolutional neural network (CNN) to learn dominating biological information with bi-directional long short-term memory network (BLSTM) to capture long-range dependencies among the features identified by the CNN. Compared with other state-of-the-art methods under five-fold cross-validation, EDLMFC shows the best performance with accuracy of 93.8%, 89.7%, and 86.1% on RPI1807, NPInter v2.0, and RPI488 datasets, respectively. The results of the independent test demonstrated that EDLMFC can effectively predict potential ncRNA-protein interactions from different organisms. Furtherly, EDLMFC is also shown to predict hub ncRNAs and proteins presented in ncRNA-protein networks of Mus musculus successfully.

Conclusions: In general, our proposed method EDLMFC improved the accuracy of ncRNA-protein interaction predictions and anticipated providing some helpful guidance on ncRNA functions research. The source code of EDLMFC and the datasets used in this work are available at https://github.com/JingjingWang-87/EDLMFC .

Keywords: Conjoint k-mer; Ensemble deep learning; Independent test; Multi-scale features combination; ncRNA–protein interactions; ncRNA–protein networks.

MeSH terms

  • Animals
  • Computational Biology*
  • Deep Learning*
  • Mice
  • Neural Networks, Computer
  • RNA, Untranslated
  • Software

Substances

  • RNA, Untranslated