Mapping of high-resolution daily particulate matter (PM2.5) concentration at the city level through a machine learning-based downscaling approach

Phuong D M Nguyen; An H Phan; Truong X Ngo; Bang Q Ho; Tran Vu Pham; Thanh T N Nguyen

doi:10.1007/s10661-024-13562-6

Mapping of high-resolution daily particulate matter (PM_2.5) concentration at the city level through a machine learning-based downscaling approach

Environ Monit Assess. 2024 Dec 23;197(1):94. doi: 10.1007/s10661-024-13562-6.

Authors

Phuong D M Nguyen^#¹, An H Phan^#¹, Truong X Ngo¹, Bang Q Ho², Tran Vu Pham³, Thanh T N Nguyen⁴

Affiliations

¹ Faculty of Information Technology, University of Engineering and Technology, Vietnam National University Hanoi, E3 Building, 144 Xuan Thuy Street, Dich Vong Hau Ward, Cau Giay District, Ha Noi, 100000, Vietnam.
² Department of Academic Affairs, Vietnam National University, 142 To Hien Thanh St, District 10, Ho Chi Minh City, 700000, Vietnam.
³ Faculty of Computer Science and Engineering, Ho Chi Minh City University of Technology (HCMUT), VNU-HCM, Ho Chi Minh City, 700000, Vietnam.
⁴ Faculty of Information Technology, University of Engineering and Technology, Vietnam National University Hanoi, E3 Building, 144 Xuan Thuy Street, Dich Vong Hau Ward, Cau Giay District, Ha Noi, 100000, Vietnam. [email protected].

^# Contributed equally.

PMID: 39714636
DOI: 10.1007/s10661-024-13562-6

Abstract

PM_2.5 pollution is a major global concern, especially in Vietnam, due to its harmful effects on health and the environment. Monitoring local PM_2.5 levels is crucial for assessing air quality. However, Vietnam's state-of-the-art (SOTA) dataset with a 3 km resolution needs to be revised to depict spatial variation in smaller regions accurately. In this research, we investigated machine learning-based downscaling methods to improve the spatial resolution and quality of Vietnam's existing 3 km PM_2.5 products using different approaches: traditional machine learning models (random forest, XGBoost, Catboost, support vector regression (SVR), mixed effect model (MEM)) and deep learning models (long short-term memory (LSTM), convolutional neural network (CNN), convolutional LSTM (ConvLSTM)). Overall, the CatBoost 2-day lag model exhibited superior performance. In terms of modeling, integrating temporal factors into tree-based models can enhance predictive accuracy. Furthermore, when faced with small datasets, traditional machine learning models demonstrate superior performance over complex deep learning approaches. The validation of machine and deep learning models based on their PM_2.5 generated maps is requested because these models can obtain very high results for model evaluation but are unrealistic for application. In this study, compared to the state-of-the-art (SOTA) PM_2.5 maps in Vietnam and the SOTA global maps, the proposed CatBoost 2-day lag model's maps showed a 57% increase in the correlation coefficient (Pearson R), as well as 42-73%, 28-75%, and 39-75% reductions in root mean squared error (RMSE), mean relative error (MRE), and mean absolute error (MAE), respectively. Additionally, the daily, monthly, and year-average maps generated by the Catboost 2-day lag model effectively capture the spatial distribution and seasonal variations of PM_2.5 in Ho Chi Minh City. These findings indicate a substantial enhancement in the accuracy and reliability of downscaled PM_2.5 maps.

Keywords: Deep learning; Downscaling; Ho Chi Minh City; Machine learning; PM2.5.

MeSH terms

Air Pollutants* / analysis
Air Pollution* / statistics & numerical data
Cities*
Environmental Monitoring* / methods
Machine Learning*
Particulate Matter* / analysis
Vietnam

Substances

Particulate Matter
Air Pollutants

Abstract

MeSH terms

Substances

Grants and funding