Improving reliability and absolute quantification of human brain microarray data by filtering and scaling probes using RNA-Seq

BMC Genomics. 2014 Feb 24;15(1):154. doi: 10.1186/1471-2164-15-154.

Abstract

Background: High-throughput sequencing is gradually replacing microarrays as the preferred method for studying mRNA expression levels, providing nucleotide resolution and accurately measuring absolute expression levels of almost any transcript, known or novel. However, existing microarray data from clinical, pharmaceutical, and academic settings represent valuable and often underappreciated resources, and methods for assessing and improving the quality of these data are lacking.

Results: To quantitatively assess the quality of microarray probes, we directly compare RNA-Seq to Agilent microarrays by processing 231 unique samples from the Allen Human Brain Atlas using RNA-Seq. Both techniques provide highly consistent, highly reproducible gene expression measurements in adult human brain, with RNA-Seq slightly outperforming microarray results overall. We show that RNA-Seq can be used as ground truth to assess the reliability of most microarray probes, remove probes with off-target effects, and scale probe intensities to match the expression levels identified by RNA-Seq. These sequencing scaled microarray intensities (SSMIs) provide more reliable, quantitative estimates of absolute expression levels for many genes when compared with unscaled intensities. Finally, we validate this result in two human cell lines, showing that linear scaling factors can be applied across experiments using the same microarray platform.

Conclusions: Microarrays provide consistent, reproducible gene expression measurements, which are improved using RNA-Seq as ground truth. We expect that our strategy could be used to improve probe quality for many data sets from major existing repositories.

Publication types

  • Research Support, U.S. Gov't, P.H.S.

MeSH terms

  • Brain / metabolism*
  • Cluster Analysis
  • Computational Biology / methods
  • Gene Expression
  • Gene Expression Profiling / methods*
  • Gene Expression Profiling / standards
  • High-Throughput Nucleotide Sequencing
  • Humans
  • Neocortex / metabolism
  • Oligonucleotide Array Sequence Analysis / methods*
  • Oligonucleotide Array Sequence Analysis / standards
  • Reproducibility of Results
  • Sequence Analysis, RNA / methods*
  • Sequence Analysis, RNA / standards
  • Transcriptome