Objectives: Many studies of air pollution and health are carried out over several geographical areas, and sometimes over several countries. This paper explores three approaches to analysis in such studies: a non hierarchical model, a two-stage analysis, and multilevel modelling. Illustrations are given using a preliminary subset of data from the CESAR study.
Design: The Central European Study on Air pollution and Respiratory Health (CESAR) was conducted in 25 areas within six Central European countries, enrolling 20,271 schoolchildren. Pollution averages were calculated for each area. Associations between pollution and health outcomes were estimated under different models.
Main results: A regression analysis of log FVC (forced vital capacity) on PM10, ignoring the geographical hierarchy, estimated a significant mean drop in FVC (adjusted for confounders) of 2.2% (95% CI 0.5% to 1.3%), p=0.007, from the area with the lowest PM10 to that with the highest. A multilevel model (mlm), using data for all children, but with random effects at area and country level, estimated a drop of 2.8% (-0.6% to 6.1%), p=0.110. A two-stage analysis (mean log FVC, adjusted for confounders, was estimated for each area using regression, and these means then regressed on PM10) estimated a drop of 2.6% (-0.5% to 5.5%), p=0.101. Simulation exercises showed the non hierarchical method to be very inadequate in the context of the CESAR study, with only half of all 95% confidence intervals for the estimated PM10 slope containing the true value (i.e., that used to create the simulated data). The two-stage and multilevel modelling methods gave results which were substantially better, though both underperformed slightly. All three methods appeared to give unbiased slope estimates.
Conclusions: Acknowledgement of hierarchical structures is essential in statistical inference--standard errors can be substantially incorrect when they are ignored. Multilevel, random-effects models correctly address hierarchical structures, though having few units at higher levels can cause problems in convergence, especially where complex modelling is required. Two-stage analyses, acknowledging hierarchy, provide simple alternatives to random-effects models.