Validation of a geospatial aggregation method for congressional districts and other US administrative geographies

Ben R Spoer; Alexander S Chen; Taylor M Lampe; Isabel S Nelson; Anne Vierse; Noah V Zazanis; Byoungjun Kim; Lorna E Thorpe; Subu V Subramanian; Marc N Gourevitch

doi:10.1016/j.ssmph.2023.101511

Validation of a geospatial aggregation method for congressional districts and other US administrative geographies

SSM Popul Health. 2023 Sep 4:24:101511. doi: 10.1016/j.ssmph.2023.101511. eCollection 2023 Dec.

Affiliations

¹ New York University Grossman School of Medicine, Department of Population Health, Division of Epidemiology, New York, NY, USA.
² Harvard T.H. Chan School of Public Health, Department of Social and Behavioral Sciences, Boston, MA, USA.
³ New York University Grossman School of Medicine, Department of Population Health, New York, NY, USA.

Abstract

Stakeholders need data on health and drivers of health parsed to the boundaries of essential policy-relevant geographies. US Congressional Districts are an example of a policy-relevant geography which generally lack health data. One strategy to generate Congressional District heath data metric estimates is to aggregate estimates from other geographies, for example, from counties or census tracts to Congressional Districts. Doing so requires several methodological decisions. We refine a method to aggregate health metric estimates from one geography to another, using a population weighted approach. The method's accuracy is evaluated by comparing three aggregated metric estimates to metric estimates from the US Census American Community Survey for the same years: Broadband Access, High School Completion, and Unemployment. We then conducted four sensitivity analyses testing: the effect of aggregating counts vs. percentages; impacts of component geography size and data missingness; and extent of population overlap between component and target geographies. Aggregated estimates were very similar to estimates for identical metrics drawn directly from the data source. Sensitivity analyses suggest the following best practices for Congressional district-based metrics: utilizing smaller, more plentiful geographies like census tracts as opposed to larger, less plentiful geographies like counties, despite potential for less stable estimates in smaller geographies; favoring geographies with higher percentage population overlap.

Keywords: Congressional districts; Geospatial analysis; Spatial data aggregation; US administrative geographies.