Comparison of machine learning algorithms to predict dissolved oxygen in an urban stream

Environ Sci Pollut Res Int. 2023 Jul;30(32):78075-78096. doi: 10.1007/s11356-023-27481-5. Epub 2023 Jun 2.

Abstract

Water quality monitoring for urban watersheds is critical to identify the negative urbanization impacts. This study sought to identify a successful predictive machine learning model with minimal parameters from easy-to-deploy, low-cost sensors to create a monitoring system for the urban stream network, Hunnicutt Creek, in Clemson, SC, USA. A multiple linear regression model was compared to machine learning algorithms k-nearest neighbor, decision tree, random forest, and gradient boosting. These algorithms were evaluated to understand which best predicted dissolved oxygen (DO) from water temperature, conductivity, turbidity, and water level change at four locations along the urban stream. The random forest algorithm had the highest performance in predicting DO for all four sites, with Nash-Sutcliffe model efficiency coefficient (NSE) scores > 0.9 at three sites and > 0.598 at the fourth site. The random forest model was further examined using explainable artificial intelligence (XAI) and found that temperature influenced the DO predictions for three of the four sites, but there were different water quality interactions depending on site location. Calculating the land cover type in each site's sub-watershed revealed that different amounts of impervious surface and vegetation influenced water quality and the resulting DO predictions. Overall, machine learning combined with land cover data helps decision-makers better understand the nuances of urban watersheds and the relationships between urban land cover and water quality.

Keywords: Artificial intelligence; Machine learning; Regression; Water quality; Water resources; Watershed.

MeSH terms

  • Algorithms
  • Artificial Intelligence
  • Environmental Monitoring* / methods
  • Machine Learning
  • Oxygen
  • Rivers*

Substances

  • Oxygen