Machine learning for modeling N2O emissions from wastewater treatment plants: Aligning model performance, complexity, and interpretability

Water Res. 2023 Oct 15:245:120667. doi: 10.1016/j.watres.2023.120667. Epub 2023 Sep 24.

Abstract

Nitrous oxide (N2O) emissions may account for up to 80 % of a wastewater treatment plant's (WWTP) total carbon footprint. Given the complexity of the pathways involved, estimating N2O emissions through mechanistic models still often fails to precisely depict process dynamics. Alternatively, data-driven methods for predicting N2O emissions hold substantial potential. However, so far, a comprehensive approach is still overlooked, impeding the advancement of full-scale application. Therefore, this study develops a comprehensive approach for using machine learning to perform online process modeling of N2O emissions. The approach is tested on a long-term N2O emission dataset from a full-scale WWTP. Uniquely, the proposed approach emphasizes not just model accuracy, but it also considers model complexity, computational speed, and interpretability, equipping operators with the insights needed for informed corrective actions. Algorithms with varying levels of complexity and interpretability including k-Nearest Neighbors (kNN), decision trees, ensemble learning models, and deep neural networks (DNN) were considered. Furthermore, a parametric multivariate outlier removal method was adjusted to account for data statistical distributions, significantly reducing data loss. By employing an effective feature selection methodology, a trade-off between data acquisition, model performance, and complexity was found, reducing the number of features by 40 % and decreasing data collection cost, model complexity and computational burden without significant effect on modeling accuracy. The best performing models are kNN (R2 = 0.88), AdaBoost (R2 = 0.94), and DNN (R2 = 0.90). Feature importance of models was analyzed and compared with process knowledge to test interpretability, guiding N2O mitigation decisions.

Keywords: Artificial intelligence; Data-driven models; Greenhouse gas emissions; Nitrous oxide; Soft sensors; Water resource recovery facilities.

MeSH terms

  • Bioreactors
  • Machine Learning
  • Nitrous Oxide / analysis
  • Wastewater*
  • Water Purification* / methods

Substances

  • Wastewater
  • Nitrous Oxide