A machine learning method to estimate PM2.5 concentrations across China with remote sensing, meteorological and land use information

Gongbo Chen; Shanshan Li; Luke D Knibbs; N A S Hamm; Wei Cao; Tiantian Li; Jianping Guo; Hongyan Ren; Michael J Abramson; Yuming Guo

doi:10.1016/j.scitotenv.2018.04.251

A machine learning method to estimate PM_2.5 concentrations across China with remote sensing, meteorological and land use information

Sci Total Environ. 2018 Sep 15:636:52-60. doi: 10.1016/j.scitotenv.2018.04.251. Epub 2018 Apr 25.

Authors

Gongbo Chen¹, Shanshan Li¹, Luke D Knibbs², N A S Hamm³, Wei Cao⁴, Tiantian Li⁵, Jianping Guo⁶, Hongyan Ren⁴, Michael J Abramson¹, Yuming Guo⁷

Affiliations

¹ Department of Epidemiology and Preventive Medicine, School of Public Health and Preventive Medicine, Monash University, Melbourne, Australia.
² Department of Epidemiology and Biostatistics, School of Public Health, The University of Queensland, Brisbane, Australia.
³ Geospatial Research Group and School of Geographical Sciences, Faculty of Science and Engineering, University of Nottingham, Ningbo, China.
⁴ Institute of Geographic Sciences and Natural Resources Research, Chinese Academy of Sciences, Beijing, China.
⁵ National Institute of Environmental Health Sciences, Chinese Center for Disease Control and Prevention, Beijing, China.
⁶ State Key Laboratory of Severe Weather, Chinese Academy of Meteorological Sciences, Beijing, China.
⁷ Department of Epidemiology and Preventive Medicine, School of Public Health and Preventive Medicine, Monash University, Melbourne, Australia. Electronic address: [email protected].

PMID: 29702402
DOI: 10.1016/j.scitotenv.2018.04.251

Abstract

Background: Machine learning algorithms have very high predictive ability. However, no study has used machine learning to estimate historical concentrations of PM_2.5 (particulate matter with aerodynamic diameter ≤ 2.5 μm) at daily time scale in China at a national level.

Objectives: To estimate daily concentrations of PM_2.5 across China during 2005-2016.

Methods: Daily ground-level PM_2.5 data were obtained from 1479 stations across China during 2014-2016. Data on aerosol optical depth (AOD), meteorological conditions and other predictors were downloaded. A random forests model (non-parametric machine learning algorithms) and two traditional regression models were developed to estimate ground-level PM_2.5 concentrations. The best-fit model was then utilized to estimate the daily concentrations of PM_2.5 across China with a resolution of 0.1° (≈10 km) during 2005-2016.

Results: The daily random forests model showed much higher predictive accuracy than the other two traditional regression models, explaining the majority of spatial variability in daily PM_2.5 [10-fold cross-validation (CV) R² = 83%, root mean squared prediction error (RMSE) = 28.1 μg/m³]. At the monthly and annual time-scale, the explained variability of average PM_2.5 increased up to 86% (RMSE = 10.7 μg/m³ and 6.9 μg/m³, respectively).

Conclusions: Taking advantage of a novel application of modeling framework and the most recent ground-level PM_2.5 observations, the machine learning method showed higher predictive ability than previous studies.

Capsule: Random forests approach can be used to estimate historical exposure to PM_2.5 in China with high accuracy.

Keywords: Aerosol optical depth; China; Machine learning; PM(2.5); Random forests.

MeSH terms

Air Pollutants / analysis*
Air Pollution / statistics & numerical data*
China
Environmental Monitoring / methods*
Machine Learning*
Particulate Matter / analysis*
Remote Sensing Technology*

Substances

Air Pollutants
Particulate Matter