Low Dimensionality in Gene Expression Data Enables the Accurate Extraction of Transcriptional Programs from Shallow Sequencing

Graham Heimberg; Rajat Bhatnagar; Hana El-Samad; Matt Thomson

doi:10.1016/j.cels.2016.04.001

Low Dimensionality in Gene Expression Data Enables the Accurate Extraction of Transcriptional Programs from Shallow Sequencing

Cell Syst. 2016 Apr 27;2(4):239-250. doi: 10.1016/j.cels.2016.04.001. Epub 2016 Apr 27.

Authors

Graham Heimberg^#^{1

2

3}, Rajat Bhatnagar^#^{1

3}, Hana El-Samad^{1

3}, Matt Thomson³

Affiliations

¹ Department of Biochemistry and Biophysics, California Institute for Quantitative Biosciences, University of California, San Francisco, San Francisco, CA 94158, USA.
² Integrative Program in Quantitative Biology, University of California, San Francisco, San Francisco, CA 94158, USA.
³ Center for Systems and Synthetic Biology, University of California, San Francisco, San Francisco, CA 94158, USA.

^# Contributed equally.

Abstract

A tradeoff between precision and throughput constrains all biological measurements, including sequencing-based technologies. Here, we develop a mathematical framework that defines this tradeoff between mRNA-sequencing depth and error in the extraction of biological information. We find that transcriptional programs can be reproducibly identified at 1% of conventional read depths. We demonstrate that this resilience to noise of "shallow" sequencing derives from a natural property, low dimensionality, which is a fundamental feature of gene expression data. Accordingly, our conclusions hold for ∼350 single-cell and bulk gene expression datasets across yeast, mouse, and human. In total, our approach provides quantitative guidelines for the choice of sequencing depth necessary to achieve a desired level of analytical resolution. We codify these guidelines in an open-source read depth calculator. This work demonstrates that the structure inherent in biological networks can be productively exploited to increase measurement throughput, an idea that is now common in many branches of science, such as image processing.

Publication types

Research Support, N.I.H., Extramural

MeSH terms

Algorithms
Animals
Gene Expression Profiling
Gene Expression Regulation
Gene Expression*
High-Throughput Nucleotide Sequencing
Humans
Mice
Research Design
Sequence Analysis, DNA
Sequence Analysis, RNA
Software

Abstract

Publication types

MeSH terms

Grants and funding