A statistical method for identifying differential gene-gene co-expression patterns

Bioinformatics. 2004 Nov 22;20(17):3146-55. doi: 10.1093/bioinformatics/bth379. Epub 2004 Jul 1.

Abstract

Motivation: To understand cancer etiology, it is important to explore molecular changes in cellular processes from normal state to cancerous state. Because genes interact with each other during cellular processes, carcinogenesis related genes may form differential co-expression patterns with other genes in different cell states. In this study, we develop a statistical method for identifying differential gene-gene co-expression patterns in different cell states.

Results: For efficient pattern recognition, we extend the traditional F-statistic and obtain an Expected Conditional F-statistic (ECF-statistic), which incorporates statistical information of location and correlation. We also propose a statistical method for data transformation. Our approach is applied to a microarray gene expression dataset for prostate cancer study. For a gene of interest, our method can select other genes that have differential gene-gene co-expression patterns with this gene in different cell states. The 10 most frequently selected genes, include hepsin, GSTP1 and AMACR, which have recently been proposed to be associated with prostate carcinogenesis. However, genes GSTP1 and AMACR cannot be identified by studying differential gene expression alone. By using tumor suppressor genes TP53, PTEN and RB1, we identify seven genes that also include hepsin, GSTP1 and AMACR. We show that genes associated with cancer may have differential gene-gene expression patterns with many other genes in different cell states. By discovering such patterns, we may be able to identify carcinogenesis related genes.

Publication types

  • Comparative Study
  • Evaluation Study
  • Research Support, Non-U.S. Gov't
  • Research Support, U.S. Gov't, Non-P.H.S.
  • Research Support, U.S. Gov't, P.H.S.
  • Validation Study

MeSH terms

  • Algorithms*
  • Biomarkers, Tumor / metabolism*
  • Diagnosis, Computer-Assisted / methods*
  • Gene Expression Profiling / methods*
  • Gene Expression Regulation, Neoplastic
  • Humans
  • Male
  • Models, Biological
  • Models, Statistical
  • Oligonucleotide Array Sequence Analysis / methods*
  • Pattern Recognition, Automated / methods
  • Prostatic Neoplasms / classification*
  • Prostatic Neoplasms / diagnosis*
  • Prostatic Neoplasms / metabolism
  • Reproducibility of Results
  • Sensitivity and Specificity

Substances

  • Biomarkers, Tumor