Improved moderation for gene-wise variance estimation in RNA-Seq via the exploitation of external information

BMC Genomics. 2013;14 Suppl 1(Suppl 1):S9. doi: 10.1186/1471-2164-14-S1-S9. Epub 2013 Jan 21.

Abstract

Background: The cost of RNA-Seq has been decreasing over the last few years. Despite this, experiments with four or less biological replicates are still quite common. Estimating the variances of gene expression estimates becomes both a challenging and interesting problem in these situations of low replication. However, with the wealth of microarray and other publicly available gene expression data readily accessible on public repositories, these sources of information can be leveraged to make improvements in variance estimation.

Results: We have proposed a novel approach called Tshrink+ for inferring differential gene expression through improved modelling of the gene-wise variances. Existing methods share information between genes of similar average expression by shrinking, or moderating, the gene-wise variances to a fitted common variance. We have been able to achieve improved estimation of the common variance by using gene-wise sample variances from external experiments, as well as gene length.

Conclusions: Using biological data we show that utilising additional external information can improve the modelling of the common variance and hence the calling of differentially expressed genes. These sources of additional information include gene length and gene-wise sample variances from other RNA-Seq and microarray datasets, of both related and seemingly unrelated tissue types. The results of this are promising, with our differential expression test, Tshrink+, performing favourably when compared to existing methods such as DESeq and edgeR when considering both gene ranking and sensitivity. These improved variance models could easily be implemented in both DESeq and edgeR and highlight the need for a database that offers a profile of gene variances over a range of tissue types and organisms.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms
  • Animals
  • Area Under Curve
  • Databases, Factual
  • Gene Expression
  • Genome*
  • Mice
  • Mice, Inbred C57BL
  • RNA / chemistry
  • RNA / metabolism*
  • ROC Curve
  • Sequence Analysis, RNA*

Substances

  • RNA