Validation of alternative methods of data normalization in gene co-expression studies

Bioinformatics. 2005 Apr 1;21(7):1112-20. doi: 10.1093/bioinformatics/bti124. Epub 2004 Nov 25.

Abstract

Motivation: Clusters of genes encoding proteins with related functions, or in the same regulatory network, often exhibit expression patterns that are correlated over a large number of conditions. Protein associations and gene regulatory networks can be modelled from expression data. We address the question of which of several normalization methods is optimal prior to computing the correlation of the expression profiles between every pair of genes.

Results: We use gene expression data from five experiments with a total of 78 hybridizations and 23 diverse conditions. Nine methods of data normalization are explored based on all possible combinations of normalization techniques according to between and within gene and experiment variation. We compare the resulting empirical distribution of gene x gene correlations with the expectations and apply cross-validation to test the performance of each method in predicting accurate functional annotation. We conclude that normalization methods based on mixed-model equations are optimal.

Publication types

  • Comparative Study
  • Evaluation Study
  • Validation Study

MeSH terms

  • Algorithms*
  • Benchmarking / methods
  • Computer Simulation
  • Data Interpretation, Statistical*
  • Gene Expression Profiling / methods*
  • Gene Expression Profiling / standards
  • Gene Expression Regulation / physiology*
  • Models, Genetic*
  • Models, Statistical
  • Numerical Analysis, Computer-Assisted
  • Oligonucleotide Array Sequence Analysis / methods*
  • Oligonucleotide Array Sequence Analysis / standards
  • Software