Machine learning models for prediction of nutrient concentrations in surface water in an agricultural watershed

J Environ Manage. 2024 Dec:372:123305. doi: 10.1016/j.jenvman.2024.123305. Epub 2024 Nov 19.

Abstract

Prediction and quantification of nutrient concentrations in surface water has gained substantial attention during recent decades because excess nutrients released from agricultural and urban watersheds can significantly deteriorate surface water quality. Machine learning (ML) models are considered an effective tool for better understanding and characterization of nutrient release from agricultural fields to surface water. However, to date, no systematic investigations have examined the implementation of different classification and regression ML models in agricultural settings to predict nutrient concentrations in surface water using a group of input variables including climatological (e.g., precipitation), hydrological (e.g., stream flow) and field characteristics (i.e., land and crop use). In the current study, multiple classification (e.g., decision trees) and regression (e.g., regression trees) ML models were applied on a dataset pertaining to surface water quality in an agricultural watershed in southern Ontario, Canada (i.e., Upper Parkhill watershed). The target variables of these models were the nutrient concentrations in surface water including nitrate, total phosphorus, soluble reactive phosphorus, and total dissolved phosphorus. These target variables were predicted using physical and chemical water parameters of surface water (e.g., temperature and DO), climatological, hydrological, and field conditions as the input variables. The performance of these different models was assessed using various evaluation metrics such as classification accuracy (CA) and coefficient of determination (R2) for classification and regression models, respectively. In general, both the ensemble bagged trees and logistic regression (CA ≥ 0.72), and exponential Gaussian process regression (R2≥ 0.93) models were the optimal classification and regression ML algorithms, respectively, where they resulted in the highest prediction accuracy of the target variables. The insights and outcomes of the current study demonstrates that ML models can be employed to effectively predict and quantify the nutrient concentrations in surface waters to supplement field-collected monitoring data in agricultural watersheds, assisting in maintaining high quality of the available surface water resources.

Keywords: Agricultural watershed; Machine learning algorithms; Model predictions; Nutrient concentrations; Surface water; Water quality.

MeSH terms

  • Agriculture*
  • Machine Learning*
  • Models, Theoretical
  • Nutrients / analysis
  • Phosphorus / analysis
  • Water
  • Water Quality

Substances

  • Phosphorus
  • Water