Discovering high-order patterns of gene expression levels

J Comput Biol. 2008 Jul-Aug;15(6):625-37. doi: 10.1089/cmb.2007.0147.

Abstract

This paper reports the discovery of statistically significant association patterns of gene expression levels from microarray data. By association patterns, we mean certain gene expression intensity intervals having statistically significant associations among themselves and with the tissue classes, such as cancerous and normal tissues. We describe how the significance of the associations among gene expression levels can be evaluated using a statistical measure in an objective manner. If an association is found to be significant based on the measure, we say that it is statistically significant. Given a gene expression data set, we first cluster the entire gene pool comprising all the genes into groups by optimizing the correlation (or more precisely, interdependence) among the gene expression levels within gene groups. From each group, we select one or several genes that are most correlated with other genes within that group to form a smaller gene pool. This gene pool then constitutes the most representative genes from the original pool. Our pattern discovery algorithm is then used, for the first time, to discover the significant association patterns of gene expression levels among the genes from the small pool. With our method, it is more effective to discover and express the associations in terms of their intensity intervals. Hence, we discretize each gene expression levels into intervals maximizing the interdependence between the gene expression and the tissue classes. From this data set of gene expression intervals, we discover the association patterns representing statistically significant associations, some positively and some negatively, with different tissue classes. We apply our pattern discovery methodology to the colon-cancer microarray gene expression data set. It consists of 2000 genes and 62 samples taken from colon cancer or normal subjects. The statistically significant combinations of gene expression levels that repress or activate colon cancer are revealed in the colon-cancer data set. The discovered association patterns are ranked according to their statistical significance and displayed for interpretation and further analysis.

MeSH terms

  • Algorithms
  • Colonic Neoplasms / metabolism*
  • Gene Expression Profiling*
  • Humans
  • Oligonucleotide Array Sequence Analysis / methods
  • Pattern Recognition, Automated*