Bayesian variable selection for parametric survival model with applications to cancer omics data

Hum Genomics. 2018 Nov 6;12(1):49. doi: 10.1186/s40246-018-0179-x.

Abstract

Background: Modeling thousands of markers simultaneously has been of great interest in testing association between genetic biomarkers and disease or disease-related quantitative traits. Recently, an expectation-maximization (EM) approach to Bayesian variable selection (EMVS) facilitating the Bayesian computation was developed for continuous or binary outcome using a fast EM algorithm. However, it is not suitable to the analyses of time-to-event outcome in many public databases such as The Cancer Genome Atlas (TCGA).

Results: We extended the EMVS to high-dimensional parametric survival regression framework (SurvEMVS). A variant of cyclic coordinate descent (CCD) algorithm was used for efficient iteration in M-step, and the extended Bayesian information criteria (EBIC) was employed to make choice on hyperparameter tuning. We evaluated the performance of SurvEMVS using numeric simulations and illustrated the effectiveness on two real datasets. The results of numerical simulations and two real data analyses show the well performance of SurvEMVS in aspects of accuracy and computation. Some potential markers associated with survival of lung or stomach cancer were identified.

Conclusions: These results suggest that our model is effective and can cope with high-dimensional omics data.

Keywords: Bayesian variable selection; EM algorithm; Non-small cell lung cancer; Omics; Stomach adenocarcinoma; Survival analysis.

Publication types

  • Research Support, N.I.H., Extramural
  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms
  • Bayes Theorem
  • Biomarkers, Tumor / genetics*
  • Computer Simulation
  • Genetic Markers
  • Genomics / statistics & numerical data*
  • Humans
  • Neoplasms / genetics*
  • Neoplasms / mortality*
  • Survival Analysis

Substances

  • Biomarkers, Tumor
  • Genetic Markers