Making sense of microarray data distributions

Bioinformatics. 2002 Apr;18(4):576-84. doi: 10.1093/bioinformatics/18.4.576.

Abstract

Motivation: Typical analysis of microarray data has focused on spot by spot comparisons within a single organism. Less analysis has been done on the comparison of the entire distribution of spot intensities between experiments and between organisms.

Results: Here we show that mRNA transcription data from a wide range of organisms and measured with a range of experimental platforms show close agreement with Benford's law (Benford, PROC: Am. Phil. Soc., 78, 551-572, 1938) and Zipf's law (Zipf, The Psycho-biology of Language: an Introduction to Dynamic Philology, 1936 and Human Behaviour and the Principle of Least Effort, 1949). The distribution of the bulk of microarray spot intensities is well approximated by a log-normal with the tail of the distribution being closer to power law. The variance, sigma(2), of log spot intensity shows a positive correlation with genome size (in terms of number of genes) and is therefore relatively fixed within some range for a given organism. The measured value of sigma(2) can be significantly smaller than the expected value if the mRNA is extracted from a sample of mixed cell types. Our research demonstrates that useful biological findings may result from analyzing microarray data at the level of entire intensity distributions.

Publication types

  • Comparative Study
  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms
  • Analysis of Variance
  • Animals
  • Chi-Square Distribution
  • Databases, Genetic*
  • Genome*
  • Humans
  • Models, Genetic*
  • Models, Statistical*
  • Oligonucleotide Array Sequence Analysis / instrumentation
  • Oligonucleotide Array Sequence Analysis / methods*
  • Oligonucleotide Array Sequence Analysis / statistics & numerical data*
  • Pattern Recognition, Automated
  • RNA, Messenger / genetics
  • Sensitivity and Specificity

Substances

  • RNA, Messenger