Spatial prediction of groundwater salinity in multiple aquifers of the Mekong Delta region using explainable machine learning models

Water Res. 2024 Sep 6:266:122404. doi: 10.1016/j.watres.2024.122404. Online ahead of print.

Abstract

Groundwater salinization is a prevalent issue in coastal regions, yet accurately predicting and understanding its causal factors remains challenging due to the complexity of the groundwater system. Therefore, this study predicted groundwater salinity in multi-layered aquifers spanning the entire Mekong Delta (MD) region using machine learning (ML) models based on an in situ dataset and using three indicators (Cl-, pH, and HCO3-). We applied nine different decision tree-based models and evaluated their prediction performances. The models were trained using 13 input variables: weather (2), hydrogeological conditions (4), water levels (3), groundwater usage (2), and relative distance from water sources (2). Subsequently, by employing model interpretation techniques, we quantified the significance of factors within the model prediction. Performance evaluations of the ML models demonstrated that the Extra Trees model exhibited superior performance and demonstrated generalization capabilities in predicting Cl- concentration, whereas the Bagging and Random Forest models outperformed the other models in predicting pH and HCO3- concentration. The coefficients of determination were determined to be 0.94, 0.67, and 0.78 for Cl-, pH, and HCO3-, respectively Additionally, the model interpretation effectively identified significant factors that depended on the target variables and aquifers. In particular, salinity indicators and aquifers that were strongly influenced by the artificial usage of groundwater were identified. Therefore, our research, which provides accurate spatial predictions and interpretations of groundwater salinity in the MD, has the potential to establish a foundation for formulating effective groundwater management policies to control groundwater salinization.

Keywords: Explainable models; Groundwater quality; Machine learning models; Salinity; Spatiotemporal prediction.