De Novo Exposomic Geospatial Assembly of Chronic Disease Regions with Machine Learning & Network Analysis

medRxiv [Preprint]. 2024 Jul 26:2024.07.25.24310832. doi: 10.1101/2024.07.25.24310832.

Abstract

Background: Determining spatial relationships between disease and the exposome is limited by available methodologies. aPEER (algorithm for Projection of Exposome and Epidemiological Relationships) uses machine learning (ML) and network analysis to find spatial relationships between diseases and the exposome in the United States.

Methods: Using aPEER we examined the relationship between 12 chronic diseases and 186 pollutants. PCA, K-means clustering, and map projection produced clusters of counties derived from pollutants, and the Jaccard correlation of these clusters with counties with high rates of disease was calculated. Pollution correlation matrices were used together with network analysis to identify the strongest disease-pollution relationships. Results were compared to LISA, Moran's I, univariate, elastic net, and random forest regression.

Findings: aPEER produced 68,820 maps with human interpretable, distinct pollution-derived regions. Diseases with the strongest pollution associations were hypertension (J=0.5316, p=3.89x10-208), COPD (J=0.4545, p=8.27x10-131), stroke (J=0.4517, p=1.15x10-127), stroke mortality (J=0.4445, p=4.28x10-125), and diabetes mellitus (J=0.4425, p=2.34x10-127). Methanol, acetaldehyde, and formaldehyde were identified as strongly associated with stroke, COPD, stroke mortality, hypertension, and diabetes mellitus in the southeast United States (which correlated with both the Stroke and Diabetes Belt). Pollutants were strongly predictive of chronic disease geography and outperformed conventional prediction models based on preventive services and social determinants of health (using elastic net and random forest regression).

Interpretation: aPEER used machine learning to identify disease and air pollutant relationships with similar or superior AUCs compared to social determinants of health (SDOH) and healthcare preventive service models. These findings highlight the utility of aPEER in epidemiological and geospatial analysis as well as the emerging role of exposomics in understanding chronic disease pathology.

Funding: Boston Public Health Commission, NHLBI (R03 HL157890) and the CDC.

Publication types

  • Preprint