Development of a Predictive Model for N-Dealkylation of Amine Contaminants Based on Machine Learning Methods

Toxics. 2024 Dec 22;12(12):931. doi: 10.3390/toxics12120931.

Abstract

Amines are widespread environmental pollutants that may pose health risks. Specifically, the N-dealkylation of amines mediated by cytochrome P450 enzymes (P450) could influence their metabolic transformation safety. However, conventional experimental and computational chemistry methods make it difficult to conduct high-throughput screening of N-dealkylation of emerging amine contaminants. Machine learning has been widely used to identify sources of environmental pollutants and predict their toxicity. However, its application in screening critical biotransformation pathways for organic pollutants has been rarely reported. In this study, we first constructed a large dataset comprising 286 emerging amine pollutants through a thorough search of databases and literature. Then, we applied four machine learning methods-random forest, gradient boosting decision tree, extreme gradient boosting, and multi-layer perceptron-to develop binary classification models for N-dealkylation. These models were based on seven carefully selected molecular descriptors that represent reactivity-fit and structural-fit. Among the predictive models, the extreme gradient boosting shows the highest prediction accuracy of 81.0%. The SlogP_VSA2 descriptor is the primary factor influencing predictions of N-dealkylation metabolism. Then an ensemble model was generated that uses a consensus strategy to integrate three different algorithms, whose performance is generally better than any single algorithm, with an accuracy rate of 86.2%. Therefore, the classification model developed in this work can provide methodological support for the high-throughput screening of N-dealkylation of amine pollutants.

Keywords: N-dealkylation reaction; amine contaminants; binary classification; biotransformation; cytochrome P450 enzymes; machine learning.