Setting nutrient boundaries to protect aquatic communities: The importance of comparing observed and predicted classifications using measures derived from a confusion matrix

Sci Total Environ. 2024 Feb 20:912:168872. doi: 10.1016/j.scitotenv.2023.168872. Epub 2023 Nov 25.

Abstract

Defining nutrient thresholds that protect and support the ecological integrity of aquatic ecosystems is a fundamental step in maintaining their natural biodiversity and preserving their resilience. With increasing catchment pressures and climate change, it is more important than ever to develop clear methods to establish thresholds for status classification and management of waters. This must often be achieved using complex data and should be robust to interference from additional pressures as well as ameliorating or confounding conditions. We use both artificial and real data to examine challenges in setting nutrient thresholds in unbalanced and skewed data. We found significant advantages to using binary logistic regression over other techniques. However, one of the key challenges is objectively selecting a probability from which to derive the nutrient threshold. For this purpose, the examination of the proportions of matching and mismatching status classifications of nutrients and a biological quality element using a confusion matrix is a key step that should be more widely adopted in threshold selection. We examined a large array of statistical measures of classification accuracy and their performance over combinations of skewness and imbalance in the data. The most appropriate threshold probability is a compromise between maximising overall classification accuracy and reducing mismatches expressed as commission (false positives) without excessive omission (false negatives). An application to a lake type indicated total phosphorus thresholds that would be around 50 μg l-1 lower than the threshold achieved by an 'unguided' approach, indicating that this approach is a very significant development meriting attention from national authorities responsible for water management.

Keywords: Accuracy; Binary logistic regression; Categorical models; Classification measures; Lakes; Nutrient standards; Rivers; WFD.

MeSH terms

  • Biodiversity
  • Ecosystem*
  • Lakes*
  • Nutrients
  • Phosphorus
  • Water

Substances

  • Water
  • Phosphorus