A comparative study on urban waterlogging susceptibility assessment based on multiple data-driven models

J Environ Manage. 2024 Jun:360:121166. doi: 10.1016/j.jenvman.2024.121166. Epub 2024 May 22.

Abstract

Accurate identification of urban waterlogging areas and assessing waterlogging susceptibility are crucial for preventing and controlling hazards. Data-driven models are utilized to forecast waterlogging areas by establishing intricate relationships between explanatory variables and waterlogging states. This approach tackles the constraints of mechanistic models, which are frequently complex and unable to incorporate socio-economic factors. Previous research predominantly employed single-type data-driven models to predict waterlogging locations and evaluation of their effectiveness. There is a scarcity of comprehensive performance comparisons and uncertainty analyses of different types of models, as well as a lack of interpretability analysis. The chosen study area was the central area of Beijing, which is prone to waterlogging. Given the high manpower, time, and economic costs associated with collecting waterlogging information, the waterlogging point distribution map released by the Beijing Water Affairs Bureau was selected as labeled samples. Twelve factors affecting waterlogging susceptibility were chosen as explanatory variables to construct Random Forest (RF), Support Vector Machine with Radial Basis Function (SVM-RBF), Particle Swarm Optimization-Weakly Labeled Support Vector Machine (PSO-WELLSVM), and Maximum Entropy (MaxEnt). The utilization of diverse single evaluation indicators (such as F-score, Kappa, AUC, etc.) to assess the model performance may yield conflicting results. The Distance between Indices of Simulation and Observation (DISO) was chosen as a comprehensive measure to assess the model's performance in predicting waterlogging points. PSO-WELLSVM exhibited the highest performance with a DISOtest value of 0.63, outperforming MaxEnt (0.78), which excelled in identifying areas highly susceptible to waterlogging, including extremely high susceptibility zones. The SVM-RBF and RF models demonstrated suboptimal performance and exhibited overfitting. The examination of waterlogging susceptibility distribution maps predicted by the four models revealed significant spatial differences due to variations in computational principles and input parameter complexities. The integration of four WSAMs based on logistic regression has been shown to significantly decrease the uncertainty of a single data-driven model and identify the most flood-prone areas. To improve the interpretability of the data model, a geographical detector was incorporated to demonstrate the explanatory capacity of 12 variables and the process of waterlogging. Building Density (BD) exhibits the highest explanatory power in relation to explain waterlogging susceptibility (Q value = 0.202), followed by Distance to Road, Frequency of Heavy Rainstorms (FHR), DEM, etc. The interaction between BD and FHR results in a nonlinear increase in the explanatory power of waterlogging susceptibility. The presence of waterlogging susceptibility risk in the research area can be attributed to the interactions of multiple factors.

Keywords: DISO; Machine learning; MaxEnt; Uncertainty analysis; Urban waterlogging susceptibility.

MeSH terms

  • Beijing
  • Floods
  • Models, Theoretical*
  • Support Vector Machine