Modelling collinear and spatially correlated data

Spat Spatiotemporal Epidemiol. 2016 Aug:18:63-73. doi: 10.1016/j.sste.2016.04.003. Epub 2016 Apr 27.

Abstract

In this work we present a statistical approach to distinguish and interpret the complex relationship between several predictors and a response variable at the small area level, in the presence of (i) high correlation between the predictors and (ii) spatial correlation for the response. Covariates which are highly correlated create collinearity problems when used in a standard multiple regression model. Many methods have been proposed in the literature to address this issue. A very common approach is to create an index which aggregates all the highly correlated variables of interest. For example, it is well known that there is a relationship between social deprivation measured through the Multiple Deprivation Index (IMD) and air pollution; this index is then used as a confounder in assessing the effect of air pollution on health outcomes (e.g. respiratory hospital admissions or mortality). However it would be more informative to look specifically at each domain of the IMD and at its relationship with air pollution to better understand its role as a confounder in the epidemiological analyses. In this paper we illustrate how the complex relationships between the domains of IMD and air pollution can be deconstructed and analysed using profile regression, a Bayesian non-parametric model for clustering responses and covariates simultaneously. Moreover, we include an intrinsic spatial conditional autoregressive (ICAR) term to account for the spatial correlation of the response variable.

Keywords: Bayesian clustering; Collinearity; Index of multiple deprivation; Pollution; Profile regression; Spatial modelling.

MeSH terms

  • Air Pollution / adverse effects
  • Air Pollution / statistics & numerical data*
  • Bayes Theorem
  • Confounding Factors, Epidemiologic
  • Humans
  • London / epidemiology
  • Models, Theoretical*
  • Spatio-Temporal Analysis