In the pursuit of understanding surface water quality for sustainable urban management, we created a machine learning modeling framework that utilized Random Forest (RF), Cubist, Extreme Gradient Boosting (XGB), Multivariate Adaptive Regression Splines (MARS), Gradient Boosting Machine (GBM), Support Vector Machine (SVM), and their hybrid stacking ensemble RF (SE-RF), as well as stacking Cubist (SE-Cubist), to predict the distribution of water quality in the Howrah Municipal Corporation (HMC) area in West Bengal, India. Additionally, we employed the ReliefF and Shapley Additive exPlanations (SHAP) methods to elucidate the underlying factors driving water quality. We first estimated the water quality index (WQI) to model seven water quality parameters: total hardness (TH), pH, total dissolved solids (TDS), dissolved oxygen (DO), biochemical oxygen demand (BOD), calcium (Ca), magnesium (Mg). Then six independent factors were utilized (i.e. Precipitation (Pr), Maximum Temperature (Tmax), Minimum Temperature (Tmin), Normalized Difference Turbidity Index (NDTI), Normalized Difference Chlorophyll Index (NDCI), and Total Dissolved Solids (TDS)) for predicting the WQI mapping through the different ML models. This study demonstrated that the SE-Cubist model outperforms other ML models. During the testing phase, it achieved the best modeling results with an R2 = 0.975, RMSE = 0.351, and MAE = 0.197. The ReliefF and SHAP analyses identified Pr and Tmax as the most significant factors influencing WQI within the study area.
Keywords: Bengal; Howrah municipal corporation; Machine learning; SE-Cubist; Shapley additive exPlanations; Water quality index.
Copyright © 2024 Elsevier Ltd. All rights reserved.