Integrative analysis of prognosis data on multiple cancer subtypes

Biometrics. 2014 Sep;70(3):480-8. doi: 10.1111/biom.12177. Epub 2014 Apr 25.

Abstract

In cancer research, profiling studies have been extensively conducted, searching for genes/SNPs associated with prognosis. Cancer is diverse. Examining the similarity and difference in the genetic basis of multiple subtypes of the same cancer can lead to a better understanding of their connections and distinctions. Classic meta-analysis methods analyze each subtype separately and then compare analysis results across subtypes. Integrative analysis methods, in contrast, analyze the raw data on multiple subtypes simultaneously and can outperform meta-analysis methods. In this study, prognosis data on multiple subtypes of the same cancer are analyzed. An AFT (accelerated failure time) model is adopted to describe survival. The genetic basis of multiple subtypes is described using the heterogeneity model, which allows a gene/SNP to be associated with prognosis of some subtypes but not others. A compound penalization method is developed to identify genes that contain important SNPs associated with prognosis. The proposed method has an intuitive formulation and is realized using an iterative algorithm. Asymptotic properties are rigorously established. Simulation shows that the proposed method has satisfactory performance and outperforms a penalization-based meta-analysis method and a regularized thresholding method. An NHL (non-Hodgkin lymphoma) prognosis study with SNP measurements is analyzed. Genes associated with the three major subtypes, namely DLBCL, FL, and CLL/SLL, are identified. The proposed method identifies genes that are different from alternatives and have important implications and satisfactory prediction performance.

Keywords: Cancer prognosis; Integrative analysis; Marker identification; Penalization.

Publication types

  • Research Support, N.I.H., Extramural
  • Research Support, Non-U.S. Gov't
  • Research Support, U.S. Gov't, Non-P.H.S.

MeSH terms

  • Data Interpretation, Statistical*
  • Genetic Markers / genetics
  • Genetic Predisposition to Disease / epidemiology
  • Genetic Predisposition to Disease / genetics
  • Humans
  • Meta-Analysis as Topic*
  • Neoplasms / genetics*
  • Neoplasms / mortality*
  • Polymorphism, Single Nucleotide / genetics*
  • Prevalence
  • Prognosis
  • Reproducibility of Results
  • Risk Assessment / methods
  • Sensitivity and Specificity
  • Survival Analysis*
  • Systems Integration

Substances

  • Genetic Markers