Surface water quality index forecasting using multivariate complementing approach reinforced with locally weighted linear regression model

Environ Sci Pollut Res Int. 2024 May;31(22):32382-32406. doi: 10.1007/s11356-024-33027-0. Epub 2024 Apr 23.

Abstract

River water quality management and monitoring are essential responsibilities for communities near rivers. Government decision-makers should monitor important quality factors like temperature, dissolved oxygen (DO), pH, and biochemical oxygen demand (BOD). Among water quality parameters, the BOD throughout 5 days is an important index that must be detected by devoting a significant amount of time and effort, which is a source of significant concern in both academic and commercial settings. The traditional experimental and statistical methods cannot give enough accuracy or solve the problem for a long time to detect something. This study used a unique hybrid model called MVMD-LWLR, which introduced an innovative method for forecasting BOD in the Klang River, Malaysia. The hybrid model combines a locally weighted linear regression (LWLR) model with a wavelet-based kernel function, along with multivariate variational mode decomposition (MVMD) for the decomposition of input variables. In addition, categorical boosting (Catboost) feature selection was used to discover and extract significant input variables. This combination of MVMD-LWLR and Catboost is the first use of such a complete model for predicting BOD levels in the given river environment. In addition, an optimization process was used to improve the performance of the model. This process utilized the gradient-based optimization (GBO) approach to fine-tune the parameters and better the overall accuracy of predicting BOD levels. To assess the robustness of the proposed method, we compared it to other popular models such as kernel ridge (KRidge) regression, LASSO, elastic net, and gaussian process regression (GPR). Several metrics, comprising root-mean-square error (RMSE), R (correlation coefficient), U95% (uncertainty coefficient at 95% level), and NSE (Nash-Sutcliffe efficiency), as well as visual interpretation, were used to evaluate the predictive efficacy of hybrid models. Extensive testing revealed that, in forecasting the BOD parameter, the MVMD-LWLR model outperformed its competitors. Consequently, for BOD forecasting, the suggested MVMD-LWLR optimized with the GBO algorithm yields encouraging and reliable results, with increased forecasting accuracy and minimal error.

Keywords: Industrial cities; Multivariate variational mode decomposition; Reinforced learning; Surface water quality; Tropical region.

MeSH terms

  • Environmental Monitoring / methods
  • Forecasting
  • Linear Models
  • Malaysia
  • Rivers* / chemistry
  • Water Quality*