Assessing the environmental determinants of micropollutant contamination in streams using explainable machine learning and network analysis

Chemosphere. 2024 Dec 31:370:144041. doi: 10.1016/j.chemosphere.2024.144041. Online ahead of print.

Abstract

Even at trace concentrations, micropollutants, including pesticides and pharmaceuticals, pose considerable ecological risks, and the increasing presence of synthetic chemical substances in aquatic systems has emerged as a growing concern. Moreover, limited machine-learning (ML) approaches exist for analyzing environmental data, and the increasing complexity of ML models has made it challenging to understand predictor-outcome relationships. In particular, understanding complex interactions among multiple variables remains challenging. This study applies and integrates explainable ML techniques and network analysis to identify the sources of micropollutants in a large watershed and determine the factors affecting micropollutant levels. We assessed the performance of four ML algorithms-support vector machine, random forest, extreme gradient boosting (XGB), and autoencoder-XGB-in predicting micropollutant levels based on the spatial characteristics of the watershed. We applied the synthetic minority oversampling technique to address the data imbalance. The XGB model demonstrated superior predictive performance, particularly for high concentration levels, achieving an accuracy of 87%-99%. Shapley additive explanations (SHAP) analysis identified temperature and rainfall as significant factors. Moreover, agricultural activities contributed to pesticide pollution, whereas urban activities contributed to pharmaceutical contamination. The network analysis corroborated the SHAP findings and revealed event-specific contamination characteristics. This included distinct discharge pathways during a dry summer event and shared pathways during a wet winter event. This approach enhances an understanding of contamination sources and pathways and subsequently aids in developing control measures and making informed policy decisions to preserve water quality in mixed land-use areas.

Keywords: Network analysis; Pesticide; Pharmaceutical; SHAP; XGB.