NMDtxDB: data-driven identification and annotation of human NMD target transcripts

RNA. 2024 Sep 16;30(10):1277-1291. doi: 10.1261/rna.080066.124.

Abstract

The nonsense-mediated RNA decay (NMD) pathway is a crucial mechanism of mRNA quality control. Current annotations of NMD substrate RNAs are rarely data-driven, but use generally established rules. We present a data set with four cell lines and combinations for SMG5, SMG6, and SMG7 knockdowns or SMG7 knockout. Based on this data set, we implemented a workflow that combines Nanopore and Illumina sequencing to assemble a transcriptome, which is enriched for NMD target transcripts. Moreover, we use coding sequence information (CDS) from Ensembl, Gencode consensus Ribo-seq ORFs, and OpenProt to enhance the CDS annotation of novel transcript isoforms. In summary, 302,889 transcripts were obtained from the transcriptome assembly process, out of which 24% are absent from Ensembl database annotations, 48,213 contain a premature stop codon, and 6433 are significantly upregulated in three or more comparisons of NMD active versus deficient cell lines. We present an in-depth view of these results through the NMDtxDB database, which is available at https://shiny.dieterichlab.org/app/NMDtxDB, and supports the study of NMD-sensitive transcripts. We open sourced our implementation of the respective web-application and analysis workflow at https://github.com/dieterich-lab/NMDtxDB and https://github.com/dieterich-lab/nmd-wf.

Keywords: alternative splicing; computational workflow; mRNA decay; nonsense-mediated RNA decay; premature stop codon.

MeSH terms

  • Codon, Nonsense / genetics
  • Databases, Genetic
  • Humans
  • Molecular Sequence Annotation*
  • Nonsense Mediated mRNA Decay* / genetics
  • Open Reading Frames / genetics
  • RNA, Messenger* / genetics
  • RNA, Messenger* / metabolism
  • Transcriptome

Substances

  • RNA, Messenger
  • Codon, Nonsense