Virtual sample generation empowers machine learning-based effluent prediction in constructed wetlands

J Environ Manage. 2023 Nov 15:346:118961. doi: 10.1016/j.jenvman.2023.118961. Epub 2023 Sep 13.

Abstract

The design of constructed wetlands (CWs) is critical to ensure effective wastewater treatment. However, limited availability of reliable data can hamper the accuracy of CW effluent predictions, thus increasing design costs and time. In this study, a novel effluent prediction framework for CWs is proposed, utilizing data dimensionality reduction and virtual sample generation. By using four the machine learning algorithms (Cubist, random forest, support vector regression, and extreme learning machine), important features of CW design are identified and used to build prediction models. The extreme learning machine algorithm achieved the highest determination coefficient and lowest error, identifying it as the most suitable algorithm for effluent prediction. A multi-distribution mega-trend-diffusion algorithm with particle swarm optimization was employed to generate virtual samples. These virtual samples were then combined with real samples to retrain the prediction model and verify the optimization effect. Comparative analysis demonstrated that the integration of virtual samples significantly improved the prediction accuracy for ammonium and chemical oxygen demand. The root mean square error decreased by averages of 60.5% and 42.1%, respectively, and the mean absolute percentage error by averages of 21.5% and 23.8%, respectively. Finally, a CW design process is proposed based on prediction models and virtual samples. This integrated forward prediction and reverse design tool can efficiently support CW design when sample sizes are limited, ultimately leading to more accurate and cost-effective design solutions.

Keywords: Constructed wetland design; Effluent quality prediction; Machine learning; Virtual sample generation.