Effective prediction of organosilicon molecular structures and risks in aquatic environment with machine learning

Sci Total Environ. 2025 Jan 10:959:178320. doi: 10.1016/j.scitotenv.2024.178320. Epub 2025 Jan 3.

Abstract

Until now, mass spectrometry databases lack molecular information of most organosilicon oligomers, and risk models needing accurate molecular descriptors are unavailable for these emerging contaminants with thousands of monomers. To address this issue, based on molecular/fragment ions and relative abundance from GC-Orbitrap-MS, this study developed appropriate classification (accuracies = 0.750-0.804) and regression (MSE = 0.008-0.014) models through neural network and support vector framework for organosilicon main/branch chain structures, which were subsequently used for speculating their persistent, bio-accumulative and toxic (PBT) potentials with neural networks (MSE = 0.002-0.017). By these methods, 116 oligomers [with 1-7 Si atoms, SiO (68.6 %) or CC (31.4 %) backbones, cyclic (14.7 %) or linear (85.3 %) structure, and six kinds of branch groups] were identified in waters from 21 Chinese cities, where hazard indices of total organosilicons were larger than 1 in 17 cities, with 5-43 oligomers first found in rivers showing persistent, bio-accumulative or toxic potential. Characteristic oligomers indicated dyeing, textile, and petrochemical industries making major contribution (13.1-34.8 %) to local organosilicon emission, and petrochemical industry was first found as ubiquitous source of nationwide organosilicon distribution. This study provided valuable methodology for risk assessment of organosilicons and also other chemicals lacking MS database.

Keywords: Emerging contaminants; Machine learning; Non-target screening; Organosilicons; PBT.