Group normalization for genomic data

PLoS One. 2012;7(8):e38695. doi: 10.1371/journal.pone.0038695. Epub 2012 Aug 13.

Abstract

Data normalization is a crucial preliminary step in analyzing genomic datasets. The goal of normalization is to remove global variation to make readings across different experiments comparable. In addition, most genomic loci have non-uniform sensitivity to any given assay because of variation in local sequence properties. In microarray experiments, this non-uniform sensitivity is due to different DNA hybridization and cross-hybridization efficiencies, known as the probe effect. In this paper we introduce a new scheme, called Group Normalization (GN), to remove both global and local biases in one integrated step, whereby we determine the normalized probe signal by finding a set of reference probes with similar responses. Compared to conventional normalization methods such as Quantile normalization and physically motivated probe effect models, our proposed method is general in the sense that it does not require the assumption that the underlying signal distribution be identical for the treatment and control, and is flexible enough to correct for nonlinear and higher order probe effects. The Group Normalization algorithm is computationally efficient and easy to implement. We also describe a variant of the Group Normalization algorithm, called Cross Normalization, which efficiently amplifies biologically relevant differences between any two genomic datasets.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Benchmarking
  • Data Interpretation, Statistical*
  • Genetic Loci / genetics
  • Genomics / methods*
  • Glucose Transport Proteins, Facilitative / genetics
  • Histones / genetics
  • Mutation
  • Saccharomyces cerevisiae Proteins / genetics

Substances

  • Glucose Transport Proteins, Facilitative
  • HXT3 protein, S cerevisiae
  • Histones
  • Saccharomyces cerevisiae Proteins

Grants and funding

This work was supported by the Searle Scholars program. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.