Identifying gene pathways associated with cancer characteristics via sparse statistical methods

IEEE/ACM Trans Comput Biol Bioinform. 2012 Jul-Aug;9(4):966-72. doi: 10.1109/TCBB.2012.48.

Abstract

We propose a statistical method for uncovering gene pathways that characterize cancer heterogeneity. To incorporate knowledge of the pathways into the model, we define a set of activities of pathways from microarray gene expression data based on the Sparse Probabilistic Principal Component Analysis (SPPCA). A pathway activity logistic regression model is then formulated for cancer phenotype. To select pathway activities related to binary cancer phenotypes, we use the elastic net for the parameter estimation and derive a model selection criterion for selecting tuning parameters included in the model estimation. Our proposed method can also reverse-engineer gene networks based on the identified multiple pathways that enables us to discover novel gene-gene associations relating with the cancer phenotypes. We illustrate the whole process of the proposed method through the analysis of breast cancer gene expression data.

MeSH terms

  • Breast Neoplasms / genetics*
  • Computational Biology / methods*
  • Databases, Genetic
  • Female
  • Gene Expression Profiling
  • Gene Regulatory Networks*
  • Humans
  • Logistic Models
  • Oligonucleotide Array Sequence Analysis
  • Principal Component Analysis