Predicting ambient PM2.5 concentrations in Ulaanbaatar, Mongolia with machine learning approaches

J Expo Sci Environ Epidemiol. 2021 Jul;31(4):699-708. doi: 10.1038/s41370-020-0257-8. Epub 2020 Aug 3.

Abstract

Background: Accurately assessing individual ambient air pollution exposure is a crucial part of epidemiological studies looking at the adverse health effect of poor air quality. This is particularly challenging in developing countries with high levels of air pollution, mostly due to sparse monitoring networks with a lack of consistent data.

Methods: We evaluated the performance of six different machine learning algorithms in predicting fine particulate matter (PM2.5) concentrations in Ulaanbaatar, Mongolia using data between 2010 and 2018. We found that the algorithms produce robust results based on performance metrics.

Results: Random forest (RF) and gradient boosting models performed the best with leave-one-location-out cross-validated R2 of 0.82 for when using data from the entire study period. After applying tuned models on the hold-out test set, R2 increased to 0.96 for the RF and 0.90 for the gradient boosting model. We also predicted PM2.5 concentrations for each administrative area (khoroo) of the city using RF and maps of predictions show spatiotemporal variations that are in line with the location of the high-emission area (ger district), city center, and population density.

Conclusion: Our results provide evidence of the advantage and feasibility of machine learning approaches in predicting ambient PM2.5 levels in a setting with limited resources and extreme air pollution levels.

Keywords: Air Pollution; Environmental Monitoring; Exposure Modeling; Particulate Matter.

Publication types

  • Research Support, N.I.H., Extramural

MeSH terms

  • Air Pollutants* / analysis
  • Air Pollution* / analysis
  • Cities
  • Environmental Monitoring
  • Humans
  • Machine Learning
  • Mongolia
  • Particulate Matter / analysis

Substances

  • Air Pollutants
  • Particulate Matter