Learning hierarchical Bayesian networks to assess the interaction effects of controlling factors on spatiotemporal patterns of fecal pollution in streams

Sci Total Environ. 2022 Mar 15:812:152520. doi: 10.1016/j.scitotenv.2021.152520. Epub 2021 Dec 22.

Abstract

The dynamics of fecal indicator bacteria, such as fecal coliforms (FC) in streams, are influenced by the interactions of a myriad of factors. To predict complex spatiotemporal patterns of FC in streams and assess the relative importance of numerous controlling factors, the adoption of a hierarchical Bayesian network (HBN) was proposed in this study. By introducing latent variables correlated to the observed variables into a Bayesian network, the HBN can represent causal relationships among a large set of variables with a multilevel hierarchy. The study area encompasses 215 sites across the watersheds of the four major rivers in South Korea. The monitoring data collected during the 2012-2019 period included 32 input variables pertaining to meteorology, geography, soil characteristics, land cover, urbanization index, livestock density, and point sources. As model endpoints, the exceedance probability of the FC standard concentration as well as two pollution characteristics (i.e., pollution degree and type), derived from FC load duration curves were used. The probability of exceeding an FC threshold value (200 CFU/100 mL) showed spatiotemporal variations, whereas pollution degree and type showed spatial variations that represent long-term severity and relative dominance of nonpoint and point source fecal pollution, respectively. The conceptual model was validated using structural equation modeling to develop the HBN. The results demonstrate that the HBN effectively simplified the model structure, while showing strong model performance (AUC = 0.81, accuracy = 0.74). The results of the sensitivity analysis indicate that land cover is the most important factor in predicting the probability of exceedance and pollution degree, whereas the urbanization index explains most of the variability in pollution type. Furthermore, the results of the scenario analysis suggest that the HBN provides an interpretable framework in which the interaction of controlling factors has causal relationships at different levels that can be identified and visualized.

Keywords: Fecal coliforms; Fecal indicator bacteria; Hierarchical Bayesian network; Land cover; Load duration curve; Structural equation modeling.

MeSH terms

  • Bayes Theorem
  • Environmental Monitoring
  • Feces
  • Rivers*
  • Water Microbiology*
  • Water Pollution / analysis