PRAM: a novel pooling approach for discovering intergenic transcripts from large-scale RNA sequencing experiments

Peng Liu; Alexandra A Soukup; Emery H Bresnick; Colin N Dewey; Sündüz Keleş

doi:10.1101/gr.252445.119

PRAM: a novel pooling approach for discovering intergenic transcripts from large-scale RNA sequencing experiments

Genome Res. 2020 Nov;30(11):1655-1666. doi: 10.1101/gr.252445.119. Epub 2020 Sep 21.

Authors

Peng Liu¹, Alexandra A Soukup², Emery H Bresnick², Colin N Dewey^{1

3}, Sündüz Keleş^{1

4}

Affiliations

¹ Department of Biostatistics and Medical Informatics, University of Wisconsin, Madison, Wisconsin 53706, USA.
² Department of Cell and Regenerative Biology, Wisconsin Blood Cancer Research Institute, Carbone Cancer Center, University of Wisconsin School of Medicine and Public Health, University of Wisconsin, Madison, Wisconsin 53705, USA.
³ Department of Computer Sciences, University of Wisconsin, Madison, Wisconsin 53706, USA.
⁴ Department of Statistics, University of Wisconsin, Madison, Wisconsin 53706, USA.

Abstract

Publicly available RNA-seq data is routinely used for retrospective analysis to elucidate new biology. Novel transcript discovery enabled by joint analysis of large collections of RNA-seq data sets has emerged as one such analysis. Current methods for transcript discovery rely on a '2-Step' approach where the first step encompasses building transcripts from individual data sets, followed by the second step that merges predicted transcripts across data sets. To increase the power of transcript discovery from large collections of RNA-seq data sets, we developed a novel '1-Step' approach named Pooling RNA-seq and Assembling Models (PRAM) that builds transcript models from pooled RNA-seq data sets. We demonstrate in a computational benchmark that 1-Step outperforms 2-Step approaches in predicting overall transcript structures and individual splice junctions, while performing competitively in detecting exonic nucleotides. Applying PRAM to 30 human ENCODE RNA-seq data sets identified unannotated transcripts with epigenetic and RAMPAGE signatures similar to those of recently annotated transcripts. In a case study, we discovered and experimentally validated new transcripts through the application of PRAM to mouse hematopoietic RNA-seq data sets. We uncovered new transcripts that share a differential expression pattern with a neighboring gene Pik3cg implicated in human hematopoietic phenotypes, and we provided evidence for the conservation of this relationship in human. PRAM is implemented as an R/Bioconductor package.

Publication types

Research Support, N.I.H., Extramural
Validation Study

MeSH terms

Animals
Class Ib Phosphatidylinositol 3-Kinase / genetics
DNA, Intergenic
Genomics
Hematopoietic Stem Cells / metabolism
Humans
Mice
RNA / metabolism
RNA-Seq / methods*
Software

Substances

DNA, Intergenic
RNA
Class Ib Phosphatidylinositol 3-Kinase
PIK3CG protein, human
Pik3cg protein, mouse

Abstract

Publication types

MeSH terms

Substances

Grants and funding