m5U-GEPred: prediction of RNA 5-methyluridine sites based on sequence-derived and graph embedding features

Front Microbiol. 2023 Oct 23:14:1277099. doi: 10.3389/fmicb.2023.1277099. eCollection 2023.

Abstract

5-Methyluridine (m5U) is one of the most common post-transcriptional RNA modifications, which is involved in a variety of important biological processes and disease development. The precise identification of the m5U sites allows for a better understanding of the biological processes of RNA and contributes to the discovery of new RNA functional and therapeutic targets. Here, we present m5U-GEPred, a prediction framework, to combine sequence characteristics and graph embedding-based information for m5U identification. The graph embedding approach was introduced to extract the global information of training data that complemented the local information represented by conventional sequence features, thereby enhancing the prediction performance of m5U identification. m5U-GEPred outperformed the state-of-the-art m5U predictors built on two independent species, with an average AUROC of 0.984 and 0.985 tested on human and yeast transcriptomes, respectively. To further validate the performance of our newly proposed framework, the experimentally validated m5U sites identified from Oxford Nanopore Technology (ONT) were collected as independent testing data, and in this project, m5U-GEPred achieved reasonable prediction performance with ACC of 91.84%. We hope that m5U-GEPred should make a useful computational alternative for m5U identification.

Keywords: 5-methyluridine; RNA modification; graph embedding; multi-species; sequence feature.

Grants and funding

The author(s) declare that financial support was received for the research, authorship, and/or publication of this article. This study was supported by the Supercomputing Platform of Xi'an Jiaotong-Liverpool University, the National Natural Science Foundation of China (31671373 and 61971422), the Scientific Research Foundation of Nanjing University of Chinese Medicine (Grant No. 013038030001), and the XJTLU Key Program Special Fund (KSF-E-51 and KSF-P-02).