A novel ensemble-based statistical approach to estimate daily wildfire-specific PM2.5 in California (2006-2020)

Environ Int. 2023 Jan:171:107719. doi: 10.1016/j.envint.2022.107719. Epub 2022 Dec 24.

Abstract

Though fine particulate matter (PM2.5) has decreased in the United States (U.S.) in the past two decades, the increasing frequency, duration, and severity of wildfires significantly (though episodically) impairs air quality in wildfire-prone regions and beyond. Increasing PM2.5 concentrations derived from wildfire smoke and associated impacts on public health require dedicated epidemiological studies. Main sources of PM2.5 data are provided by government-operated monitors sparsely located across U.S., leaving several regions and potentially vulnerable populations unmonitored. Current approaches to estimate PM2.5 concentrations in unmonitored areas often rely on big data, such as satellite-derived aerosol properties and meteorological variables, apply computationally-intensive deterministic modeling, and do not distinguish wildfire-specific PM2.5 from other sources of emissions such as traffic and industrial sources. Furthermore, modelling wildfire-specific PM2.5 presents a challenge since measurements of the smoke contribution to PM2.5 pollution are not available. Here, we aim to use statistical methods to isolate wildfire-specific PM2.5 from other sources of emissions. Our study presents an ensemble model that optimally combines multiple machine learning algorithms (including gradient boosting machine, random forest and deep learning), and a large set of explanatory variables to, first, estimate daily PM2.5 concentrations at the ZIP code level, a relevant spatiotemporal resolution for epidemiological studies. Subsequently, we propose a novel implementation of an imputation approach to estimate the wildfire-specific PM2.5 concentrations that could be applied geographical regions in the US or worldwide. Our ensemble model achieved comparable results to previous machine learning studies for PM2.5 prediction while avoiding processing larger, computationally intensive datasets. Our study is the first to apply a suite of statistical models using readily available datasets to provide daily wildfire-specific PM2.5 at a fine spatial scale for a 15-year period, thus providing a relevant spatiotemporal resolution and timely contribution for epidemiological studies.

Keywords: Air pollution; Human health; Machine learning; PM(2.5); Wildfire.

Publication types

  • Research Support, N.I.H., Extramural

MeSH terms

  • Air Pollutants* / analysis
  • Air Pollution* / analysis
  • California
  • Particulate Matter / analysis
  • Smoke / adverse effects
  • United States
  • Wildfires*

Substances

  • Air Pollutants
  • Particulate Matter
  • Smoke