A comprehensive comparison and analysis of computational predictors for RNA N6-methyladenosine sites of Saccharomyces cerevisiae

Brief Funct Genomics. 2019 Nov 19;18(6):367-376. doi: 10.1093/bfgp/elz018.

Abstract

N6-methyladenosine (m6A) modification, as one of the commonest post-transcription modifications in RNAs, has been reported to be highly related to many biological processes. Over the past decade, several tools for m6A sites prediction of Saccharomyces cerevisiae have been developed and are freely available online. However, the quality of predictions by these tools is difficult to quantify and compare. In this study, an independent dataset M6Atest6540 was compiled to systematically evaluate nine publicly available m6A prediction tools for S. cerevisiae. The experimental results indicate that RAM-ESVM achieved the best performance on M6Atest6540; however, most models performed substantially worse than their performances reported in the original papers. The benchmark dataset Met2614, which was used as the training dataset for the nine methods, were further analyzed by using a position bias index. The results demonstrated the significantly different bias of dataset Met2614 compared with the RNA segments around m6A sites recorded in RMBase. Moreover, newMet2614 was collected by randomly selecting RNA segments from non-redundant data recorded in RMBase, and three different kinds of features were extracted. The performances of the models built on Met2614 and newMet2614 with the features were compared, which shows the better generalization of models built on newMet2614. Our results also indicate the position-specific propensity-based features outperform other features, although they are also easily over-fitted on a biased dataset.

Keywords: N6-methyladenosine sites; computational predictor; dataset bias; position-specific propensity; web servers.

Publication types

  • Comparative Study
  • Research Support, Non-U.S. Gov't

MeSH terms

  • Adenosine / analogs & derivatives*
  • Adenosine / metabolism
  • Base Sequence
  • Computational Biology / methods
  • Datasets as Topic
  • Machine Learning
  • RNA / analysis*
  • RNA / metabolism*
  • RNA Processing, Post-Transcriptional
  • RNA, Fungal / analysis
  • RNA, Fungal / metabolism
  • Saccharomyces cerevisiae / genetics*
  • Sequence Analysis, RNA / methods*
  • Transcriptome

Substances

  • RNA, Fungal
  • RNA
  • N-methyladenosine
  • Adenosine