The precise coordination of important biological processes, such as differentiation and development, is highly dependent on the regulation of expression of the genetic information. The flow of the genetic information is tightly regulated on multiple levels. Among them, RNA export to cytosol is an essential step for the production of proteins in eukaryotic cells. Hence, estimating the relative concentration of RNA molecules of a given transcript species in the nucleus and in the cytosol is of major significance as it contributes to the understanding of the dynamics of RNA trafficking between the nucleus and the cytosol. The most efficient way to estimate the levels of RNA species genome-wide is through RNA sequencing (RNAseq). While RNAseq can be performed separately in the nucleus and in the cytosol, because measured transcript levels are relative to the total volume of RNA in these compartments, and because this volume is usually unknown, the transcript levels in the nucleus and in the cytosol cannot be directly compared. Here we show theoretically that if, in addition to nuclear and cytosolic RNA-seq, whole cell RNA-seq is also performed, then accurate estimations of the localization of transcripts can be obtained. Based on this, we designed a method that estimates, first the fraction of the total RNA volume in the cytosol (nucleus), and then, this fraction for every transcript. We evaluate our methodology on simulated data and nuclear and cytosolic single cell data available. Finally, we use our method to investigate the cellular localization of transcripts using bulk RNAseq data from the ENCODE project.
Keywords: ENCODE4; RNA subcellular localization; transcript nucleo-cytosolic distribution.