RNAirport: a deep neural network-based database characterizing representative gene models in plants

J Genet Genomics. 2024 Jun;51(6):652-664. doi: 10.1016/j.jgg.2024.03.004. Epub 2024 Mar 20.

Abstract

A 5'-leader, known initially as the 5'-untranslated region, contains multiple isoforms due to alternative splicing (aS) and alternative transcription start site (aTSS). Therefore, a representative 5'-leader is demanded to examine the embedded RNA regulatory elements in controlling translation efficiency. Here, we develop a ranking algorithm and a deep-learning model to annotate representative 5'-leaders for five plant species. We rank the intra-sample and inter-sample frequency of aS-mediated transcript isoforms using the Kruskal-Wallis test-based algorithm and identify the representative aS-5'-leader. To further assign a representative 5'-end, we train the deep-learning model 5'leaderP to learn aTSS-mediated 5'-end distribution patterns from cap-analysis gene expression data. The model accurately predicts the 5'-end, confirmed experimentally in Arabidopsis and rice. The representative 5'-leader-contained gene models and 5'leaderP can be accessed at RNAirport (http://www.rnairport.com/leader5P/). The Stage 1 annotation of 5'-leader records 5'-leader diversity and will pave the way to Ribo-Seq open-reading frame annotation, identical to the project recently initiated by human GENCODE.

Keywords: 5′-leader; Deep learning; RNA regulatory elements; Synthetic biology; Transcript isoforms; Translational control; uORF.

MeSH terms

  • 5' Untranslated Regions* / genetics
  • Algorithms
  • Alternative Splicing / genetics
  • Arabidopsis / genetics
  • Databases, Genetic
  • Deep Learning
  • Gene Expression Regulation, Plant / genetics
  • Models, Genetic
  • Neural Networks, Computer
  • Oryza / genetics
  • Plants / genetics
  • RNA, Plant / genetics
  • Transcription Initiation Site

Substances

  • 5' Untranslated Regions
  • RNA, Plant