Large-scale statistical analyses of rice ESTs reveal correlated patterns of gene expression

Genome Res. 1999 Oct;9(10):950-9. doi: 10.1101/gr.9.10.950.

Abstract

Large, publicly available collections of expressed sequence tags (ESTs) have been generated from Arabidopsis thaliana and rice (Oryza sativa). A potential, but relatively unexplored application of this data is in the study of plant gene expression. Other EST data, mainly from human and mouse, have been successfully used to point out genes exhibiting tissue- or disease-specific expression, as well as for identification of alternative transcripts. In this report, we go a step further in showing that computer analyses of plant EST data can be used to generate evidence of correlated expression patterns of genes across various tissues. Furthermore, tissue types and organs can be classified with respect to one another on the basis of their global gene expression patterns. As in previous studies, expression profiles are first estimated from EST counts. By clustering gene expression profiles or whole cDNA library profiles, we show that genes with similar functions, or cDNA libraries expected to share patterns of gene expression, are grouped together. Promising uses of this technique include functional genomics, in which evidence of correlated expression might complement (or substitute for) those of sequence similarity in the annotation of anonymous genes and identification of surrogate markers. The analysis presented here combines the application of a correlation-based clustering method with a graphical color map allowing intuitive visualization of patterns within a large table of expression measurements.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Contig Mapping
  • DNA, Complementary / analysis
  • Databases, Factual
  • Expressed Sequence Tags*
  • Gene Expression Regulation, Plant*
  • Models, Genetic
  • Models, Statistical
  • Multigene Family
  • Oryza / genetics*

Substances

  • DNA, Complementary