Generating High Spatial Resolution Exposure Estimates from Sparse Regulatory Monitoring Data

Atmos Environ (1994). 2023 Nov 15:313:120076. doi: 10.1016/j.atmosenv.2023.120076. Epub 2023 Sep 12.

Abstract

Random Forest algorithms have extensively been used to estimate ambient air pollutant concentrations. However, the accuracy of model-predicted estimates can suffer from extrapolation problems associated with limited measurement data to train the machine learning algorithms. In this study, we developed and evaluated two approaches, incorporating low-cost sensor data, that enhanced the extrapolating ability of random-forest models in areas with sparse monitoring data. Rochester, NY is the area of a pregnancy-cohort study. Daily PM2.5 concentrations from the NAMS/SLAMS sites were obtained and used as the response variable in the model, with satellite data, meteorological, and land-use variables included as predictors. To improve the base random-forest models, we used PM2.5 measurements from a pre-existing low-cost sensors network, and then conducted a two-step backward selection to gradually eliminate variables with potential emission heterogeneity from the base models. We then introduced the regression-enhanced random forest method into the model development. Finally, contemporaneous urinary 1-hydroxypyrene was used to evaluate the PM2.5 predictions generated from the two approaches. The two-step approach increased the average external validation R2 from 0.49 to 0.65, and decreased the RMSE from 3.56 μg/m3 to 2.96 μg/m3. For the regression-enhanced random forest models, the average R2 of the external validation was 0.54, and the RMSE was 3.40 μg/m3. We also observed significant and comparable relationships between urinary 1-hydroxypyrene levels and PM2.5 predictions from both improved models. This PM2.5 model estimation strategy could improve the extrapolating ability of random forest models in areas with sparse monitoring data.

Keywords: Fine particle matter (PM2.5); Low-cost sensor; Random Forest; Regression Enhanced Random Forest.