Delineation of amplification, hybridization and location effects in microarray data yields better-quality normalization

Marc Hulsman; Anouk Mentink; Eugene P van Someren; Koen J Dechering; Jan de Boer; Marcel Jt Reinders

doi:10.1186/1471-2105-11-156

Delineation of amplification, hybridization and location effects in microarray data yields better-quality normalization

BMC Bioinformatics. 2010 Mar 26:11:156. doi: 10.1186/1471-2105-11-156.

Authors

Marc Hulsman¹, Anouk Mentink, Eugene P van Someren, Koen J Dechering, Jan de Boer, Marcel Jt Reinders

Affiliation

¹ Delft Bioinformatics Lab, Delft University of Technology, Mekelweg 4, Delft 2628 CD, The Netherlands. [email protected]

Abstract

Background: Oligonucleotide arrays have become one of the most widely used high-throughput tools in biology. Due to their sensitivity to experimental conditions, normalization is a crucial step when comparing measurements from these arrays. Normalization is, however, far from a solved problem. Frequently, we encounter datasets with significant technical effects that currently available methods are not able to correct.

Results: We show that by a careful decomposition of probe specific amplification, hybridization and array location effects, a normalization can be performed that allows for a much improved analysis of these data. Identification of the technical sources of variation between arrays has allowed us to build statistical models that are used to estimate how the signal of individual probes is affected, based on their properties. This enables a model-based normalization that is probe-specific, in contrast with the signal intensity distribution normalization performed by many current methods. Next to this, we propose a novel way of handling background correction, enabling the use of background information to weight probes during summarization. Testing of the proposed method shows a much improved detection of differentially expressed genes over earlier proposed methods, even when tested on (experimentally tightly controlled and replicated) spike-in datasets.

Conclusions: When a limited number of arrays are available, or when arrays are run in different batches, technical effects have a large influence on the measured expression of genes. We show that a detailed modelling and correction of these technical effects allows for an improved analysis in these situations.

MeSH terms

Algorithms*
Databases, Genetic
Gene Expression Profiling / methods
Genomics / methods*
Nucleic Acid Hybridization
Oligonucleotide Array Sequence Analysis / methods*