Retail environments, such as healthcare locations, food stores, and recreation facilities, may be relevant to many health behaviors and outcomes. However, minimal guidance on how to collect, process, aggregate, and link these data results in inconsistent or incomplete measurement that can introduce misclassification bias and limit replication of existing research. We describe the following steps to leverage business data for longitudinal neighborhood health research: re-geolocating establishment addresses, preliminary classification using standard industrial codes, systematic checks to refine classifications, incorporation and integration of complementary data sources, documentation of a flexible hierarchical classification system and variable naming conventions, and linking to neighborhoods and participant residences. We show results of this classification from a dataset of locations (over 77 million establishment locations) across the contiguous U.S. from 1990 to 2014. By incorporating complementary data sources, through manual spot checks in Google StreetView and word and name searches, we enhanced a basic classification using only standard industrial codes. Ultimately, providing these enhanced longitudinal data and supplying detailed methods for researchers to replicate our work promotes consistency, replicability, and new opportunities in neighborhood health research.
Keywords: Businesses; Classification; Cohort studies; Commercial; Food environment; GIS or geographic information systems; Geography; Physical activity destinations; Place and health.