This article is motivated by the increasing need to model risk for large hospital and health care systems that provide services to diverse and complex patients. Often, heterogeneity across a population is determined by a set of factors such as chronic conditions. When these stratifying factors result in overlapping subpopulations, it is likely that the covariate effects for the overlapping groups have some similarity. We exploit this similarity by imposing structural constraints on the importance of variables in predicting outcomes such as hospital admission. Our basic assumption is that if a variable is important for a subpopulation with one of the chronic conditions, then it should be important for the subpopulation with both conditions. However, a variable can be important for the subpopulation with two particular chronic conditions but not for the subpopulations of people with just one of those two conditions. This assumption and its generalization to more conditions are reasonable and aid greatly in borrowing strength across the subpopulations. We prove an oracle property for our estimation method and show that even when the structural assumptions are misspecified, our method will still include all of the truly nonzero variables in large samples. We demonstrate impressive performance of our method in extensive numerical studies and on an application in hospital admission prediction and validation for the Medicare population of a large health care provider.
Keywords: Heterogeneity; Hierarchical penalization; Risk prediction; Variable selection.
© 2017, The International Biometric Society.