Improving the construction and prediction strategy of the Air Quality Health Index (AQHI) using machine learning: A case study in Guangzhou, China

Lei Zhang; Yuanyuan Chen; Hang Dong; Di Wu; Sili Chen; Xin Li; Boheng Liang; Qiaoyuan Yang

doi:10.1016/j.ecoenv.2024.117287

Improving the construction and prediction strategy of the Air Quality Health Index (AQHI) using machine learning: A case study in Guangzhou, China

Ecotoxicol Environ Saf. 2024 Nov 8:287:117287. doi: 10.1016/j.ecoenv.2024.117287. Online ahead of print.

Authors

Lei Zhang¹, Yuanyuan Chen², Hang Dong², Di Wu², Sili Chen¹, Xin Li³, Boheng Liang⁴, Qiaoyuan Yang⁵

Affiliations

¹ Department of Preventive Medicine, School of Public Health, Guangzhou Medical University, Xinzao, Panyu District, Guangzhou 511436, China.
² Guangzhou Center for Disease Control and Prevention, Guangzhou 510440, China.
³ Institute of Toxicology, Guangdong Provincial Center for Disease Control and Prevention, 160 Qunxian Road, Panyu District, Guangzhou 511430, China.
⁴ Guangzhou Center for Disease Control and Prevention, Guangzhou 510440, China. Electronic address: [email protected].
⁵ Department of Preventive Medicine, School of Public Health, Guangzhou Medical University, Xinzao, Panyu District, Guangzhou 511436, China; Guangdong Provincial Key Laboratory of Major Obstetric Diseases, Guangdong Provincial Clinical Research Center for Obstetrics and Gynecology, The Third Affiliated Hospital of Guangzhou Medical University, Guangzhou, China. Electronic address: [email protected].

PMID: 39520752
DOI: 10.1016/j.ecoenv.2024.117287

Abstract

Effectively capturing the risk of air pollution and informing residents is vital to public health. The widely used Air Quality Index (AQI) has been criticized for failing to accurately represent the non-threshold linear relationship between air pollution and health outcomes. Although the Air Quality Health Index (AQHI) was developed to address these limitations, it lacks comprehensive construction criteria. This work proposed a novel construction and prediction strategy of AQHI using machine learning methods. Our RF-Alasso-QGC method integrated Random Forest (RF), Adaptive Lasso (Alasso), and Quantile-based G-Computation (QGC) for effective pollutant selection and AQHI construction. The RF-Alasso method excluded CO, while identified PM₁₀, PM_2.5, NO₂, SO₂, and O₃ as major contributors to mortality. The QGC method controlled the additive and synergistic effects among these air pollutants. Compared to the Standard-AQHI, the new RF-Alasso-QGC-AQHI demonstrated a stronger correlation with health outcomes, with an interquartile (IQR) increase associated with a 1.80 % (1.44 %, 2.17 %) increase in total mortality, and the best goodness of fit. Additionally, the hybrid Auto Regressive Moving Average-Long Short Term Memory (ARIMA-LSTM) successfully forecast the new AQHI, achieving a coefficient of determination (R²) of 0.961. The work demonstrated that the improved AQHI construction and prediction strategy more efficiently communicate and provide early warnings of the health risks of multiple air pollutants.

Keywords: Adaptive Lasso; Air pollutant selection; Air quality health index; Health risk prediction; Quantile-based G-Computation; Random Forest.