Regression Trees for Longitudinal Data with Baseline Covariates

Biostat Epidemiol. 2019;3(1):1-22. doi: 10.1080/24709360.2018.1557797. Epub 2018 Dec 31.

Abstract

Longitudinal changes in a population of interest are often heterogeneous and may be influenced by a combination of baseline factors. In such cases, traditional linear mixed effects models (Laird and Ware, 1982) assuming common parametric form for the mean structure may not be applicable. We show that the regression tree methodology for longitudinal data can identify and characterize longitudinally homogeneous subgroups. Most of the currently available regression tree construction methods are either limited to a repeated measures scenario or combine the heterogeneity among subgroups with the random inter-subject variability. We propose a longitudinal classification and regression tree (LongCART) algorithm under conditional inference framework (Hothorn, Hornik and Zeileis, 2006) that overcomes these limitations utilizing a two-step approach. The LongCART algorithm first selects the partitioning variable via a parameter instability test and then finds the optimal split for the selected partitioning variable. Thus, at each node, the decision of further splitting is type-I error controlled and thus it guards against variable selection bias, over-fitting and spurious splitting. We have obtained the asymptotic results for the proposed instability test and examined its finite sample behavior through simulation studies. Comparative performance of LongCART algorithm were evaluated empirically via simulation studies. Finally, we applied LongCART to study the longitudinal changes in choline levels among HIV-positive patients.

Keywords: Brownian Bridge; Instability test; LongCART; Longitudinal data; Mixed models; Regression tree; Score process.