A regression tree approach to identifying subgroups with differential treatment effects

Stat Med. 2015 May 20;34(11):1818-33. doi: 10.1002/sim.6454. Epub 2015 Feb 5.

Abstract

In the fight against hard-to-treat diseases such as cancer, it is often difficult to discover new treatments that benefit all subjects. For regulatory agency approval, it is more practical to identify subgroups of subjects for whom the treatment has an enhanced effect. Regression trees are natural for this task because they partition the data space. We briefly review existing regression tree algorithms. Then, we introduce three new ones that are practically free of selection bias and are applicable to data from randomized trials with two or more treatments, censored response variables, and missing values in the predictor variables. The algorithms extend the generalized unbiased interaction detection and estimation (GUIDE) approach by using three key ideas: (i) treatment as a linear predictor, (ii) chi-squared tests to detect residual patterns and lack of fit, and (iii) proportional hazards modeling via Poisson regression. Importance scores with thresholds for identifying influential variables are obtained as by-products. A bootstrap technique is used to construct confidence intervals for the treatment effects in each node. The methods are compared using real and simulated data.

Keywords: bootstrap; missing values; proportional hazards; selection bias.

Publication types

  • Research Support, N.I.H., Extramural
  • Research Support, Non-U.S. Gov't
  • Research Support, U.S. Gov't, Non-P.H.S.

MeSH terms

  • Algorithms*
  • Chi-Square Distribution
  • Confidence Intervals
  • Humans
  • Models, Statistical*
  • Neoplasms / therapy*
  • Poisson Distribution
  • Randomized Controlled Trials as Topic
  • Regression Analysis*
  • Selection Bias