Group normalization for genomic data

Mahmoud Ghandi; Michael A Beer

doi:10.1371/journal.pone.0038695

Group normalization for genomic data

PLoS One. 2012;7(8):e38695. doi: 10.1371/journal.pone.0038695. Epub 2012 Aug 13.

Authors

Mahmoud Ghandi¹, Michael A Beer

Affiliation

¹ McKusick-Nathans Institute of Genetic Medicine and the Department of Biomedical Engineering, Johns Hopkins University, Baltimore, Maryland, United States of America.

Abstract

Data normalization is a crucial preliminary step in analyzing genomic datasets. The goal of normalization is to remove global variation to make readings across different experiments comparable. In addition, most genomic loci have non-uniform sensitivity to any given assay because of variation in local sequence properties. In microarray experiments, this non-uniform sensitivity is due to different DNA hybridization and cross-hybridization efficiencies, known as the probe effect. In this paper we introduce a new scheme, called Group Normalization (GN), to remove both global and local biases in one integrated step, whereby we determine the normalized probe signal by finding a set of reference probes with similar responses. Compared to conventional normalization methods such as Quantile normalization and physically motivated probe effect models, our proposed method is general in the sense that it does not require the assumption that the underlying signal distribution be identical for the treatment and control, and is flexible enough to correct for nonlinear and higher order probe effects. The Group Normalization algorithm is computationally efficient and easy to implement. We also describe a variant of the Group Normalization algorithm, called Cross Normalization, which efficiently amplifies biologically relevant differences between any two genomic datasets.

Publication types

Research Support, Non-U.S. Gov't

MeSH terms

Benchmarking
Data Interpretation, Statistical*
Genetic Loci / genetics
Genomics / methods*
Glucose Transport Proteins, Facilitative / genetics
Histones / genetics
Mutation
Saccharomyces cerevisiae Proteins / genetics

Substances

Glucose Transport Proteins, Facilitative
HXT3 protein, S cerevisiae
Histones
Saccharomyces cerevisiae Proteins

Grants and funding

This work was supported by the Searle Scholars program. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.