Pathway hunting by random survival forests

Bioinformatics. 2013 Jan 1;29(1):99-105. doi: 10.1093/bioinformatics/bts643. Epub 2012 Nov 4.

Abstract

Motivation: Pathway or gene set analysis has been widely applied to genomic data. Many current pathway testing methods use univariate test statistics calculated from individual genomic markers, which ignores the correlations and interactions between candidate markers. Random forests-based pathway analysis is a promising approach for incorporating complex correlation and interaction patterns, but one limitation of previous approaches is that pathways have been considered separately, thus pathway cross-talk information was not considered.

Results: In this article, we develop a new pathway hunting algorithm for survival outcomes using random survival forests, which prioritize important pathways by accounting for gene correlation and genomic interactions. We show that the proposed method performs favourably compared with five popular pathway testing methods using both synthetic and real data. We find that the proposed methodology provides an efficient and powerful pathway modelling framework for high-dimensional genomic data.

Availability: The R code for the analysis used in this article is available upon request.

Publication types

  • Research Support, N.I.H., Extramural
  • Research Support, U.S. Gov't, Non-P.H.S.

MeSH terms

  • Algorithms*
  • Colonic Neoplasms / genetics
  • Colonic Neoplasms / metabolism
  • Colonic Neoplasms / mortality
  • Female
  • Humans
  • Ovarian Neoplasms / genetics
  • Ovarian Neoplasms / metabolism
  • Ovarian Neoplasms / mortality
  • Signal Transduction
  • Survival Analysis*
  • Transcriptome