Leveraging explainable machine learning for enhanced management of lake water quality

Sajad Soleymani Hasani; Mauricio E Arias; Hung Q Nguyen; Osama M Tarabih; Zachariah Welch; Qiong Zhang

doi:10.1016/j.jenvman.2024.122890

Leveraging explainable machine learning for enhanced management of lake water quality

J Environ Manage. 2024 Nov:370:122890. doi: 10.1016/j.jenvman.2024.122890. Epub 2024 Oct 13.

Authors

Sajad Soleymani Hasani¹, Mauricio E Arias², Hung Q Nguyen¹, Osama M Tarabih¹, Zachariah Welch³, Qiong Zhang¹

Affiliations

¹ Department of Civil and Environmental Engineering, University of South Florida, 4202 E Fowler Ave, Tampa, FL, 33620, USA.
² Department of Civil and Environmental Engineering, University of South Florida, 4202 E Fowler Ave, Tampa, FL, 33620, USA. Electronic address: [email protected].
³ South Florida Water Management District, 3301 Gun Club Rd, West Palm Beach, FL, 33406, USA.

PMID: 39405849
DOI: 10.1016/j.jenvman.2024.122890

Abstract

Freshwater lakes worldwide suffer from eutrophication caused by excessive nutrient loads, particularly nitrogen (N) and phosphorus (P) from wastewater and runoff, affecting aquatic life and public health. Using a large (1800 km²) subtropical lake as an example (Lake Okeechobee, Florida, USA), this study aims to (1) predict key water quality parameters using machine learning (ML) algorithms based on easily measurable variables, (2) identify spatial patterns of these parameters, and (3) determine environmental drivers influencing turbidity levels. The study employs four ML algorithms-Extreme Gradient Boosting (XGB), Light Gradient-Boosting Machine (LGBM), Support Vector Regression (SVR), and Random Forests (RFs)-to predict total phosphorus (TP), total nitrogen (TN), nitrate + nitrite (NOx-N), and turbidity, via station-specific and lake-wide modeling approaches. The station-specific models uncover spatial patterns, while the lake-wide models support operational decision-making. Results indicated that lake stage (water level), water temperature, and, most notably, turbidity were the main nutrient predictors, with XGB demonstrating superior prediction performance. Spatial analysis using K-means clustering identified three distinct lake regions based on nutrient levels and turbidity. Due to its importance, SHapley Additive exPlanations (SHAP) were employed to identify and quantify environmental factors affecting turbidity. Inflows and lake stage were found as primary drivers of turbidity near lake inlets, while wind speed and air temperature affected turbidity in the middle of the lake. This research advances the understanding of lake water quality dynamics, emphasizing the importance of frequent monitoring of turbidity and its environmental drivers for enhanced management and future mitigation efforts.

Keywords: Explainable machine learning; Lake Okeechobee; Nitrogen; Phosphorus; Turbidity.

MeSH terms

Algorithms
Environmental Monitoring / methods
Eutrophication
Florida
Lakes*
Machine Learning*
Nitrogen* / analysis
Phosphorus* / analysis
Water Quality*

Substances

Phosphorus
Nitrogen