Community-wide health risk assessment using geographically resolved demographic data: a synthetic population approach

PLoS One. 2014 Jan 28;9(1):e87144. doi: 10.1371/journal.pone.0087144. eCollection 2014.

Abstract

Background: Evaluating environmental health risks in communities requires models characterizing geographic and demographic patterns of exposure to multiple stressors. These exposure models can be constructed from multivariable regression analyses using individual-level predictors (microdata), but these microdata are not typically available with sufficient geographic resolution for community risk analyses given privacy concerns.

Methods: We developed synthetic geographically-resolved microdata for a low-income community (New Bedford, Massachusetts) facing multiple environmental stressors. We first applied probabilistic reweighting using simulated annealing to data from the 2006-2010 American Community Survey, combining 9,135 microdata samples from the New Bedford area with census tract-level constraints for individual and household characteristics. We then evaluated the synthetic microdata using goodness-of-fit tests and by examining spatial patterns of microdata fields not used as constraints. As a demonstration, we developed a multivariable regression model predicting smoking behavior as a function of individual-level microdata fields using New Bedford-specific data from the 2006-2010 Behavioral Risk Factor Surveillance System, linking this model with the synthetic microdata to predict demographic and geographic smoking patterns in New Bedford.

Results: Our simulation produced microdata representing all 94,944 individuals living in New Bedford in 2006-2010. Variables in the synthetic population matched the constraints well at the census tract level (e.g., ancestry, gender, age, education, household income) and reproduced the census-derived spatial patterns of non-constraint microdata. Smoking in New Bedford was significantly associated with numerous demographic variables found in the microdata, with estimated tract-level smoking rates varying from 20% (95% CI: 17%, 22%) to 37% (95% CI: 30%, 45%).

Conclusions: We used simulation methods to create geographically-resolved individual-level microdata that can be used in community-wide exposure and risk assessment studies. This approach provides insights regarding community-scale exposure and vulnerability patterns, valuable in settings where policy can be informed by characterization of multi-stressor exposures and health risks at high resolution.

Publication types

  • Research Support, U.S. Gov't, Non-P.H.S.

MeSH terms

  • Demography*
  • Environmental Exposure / analysis
  • Female
  • Geography
  • Health Surveys*
  • Humans
  • Male
  • Massachusetts
  • Models, Theoretical
  • Multivariate Analysis
  • Poverty
  • Regression Analysis
  • Risk Assessment
  • Smoking

Grants and funding

This research has been supported by a grant from the U.S. Environmental Protection Agency's Science to Achieve Results (STAR) program. Although the research described in the article has been funded wholly or in part by the U.S. Environmental Protection Agency's STAR program through grant RD83457702, it has not been subjected to any EPA review and therefore does not necessarily reflect the views of the Agency, and no official endorsement should be inferred. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.