Accurately predicting plant cuticle-air partition coefficients (Kca) is essential for assessing the ecological risk of organic pollutants and elucidating their partitioning mechanisms. The current work collected 255 measured Kca values from 25 plant species and 106 compounds (dataset (I)) and averaged them to establish a dataset (dataset (II)) containing Kca values for 106 compounds. Machine-learning algorithms (multiple linear regression (MLR), multi-layer perceptron (MLP), k-nearest neighbors (KNN), and gradient-boosting decision tree (GBDT)) were applied to develop eight QSPR models for predicting Kca. The results showed that the developed models had a high goodness of fit, as well as good robustness and predictive performance. The GBDT-2 model (Radj2 = 0.925, QLOO2 = 0.756, QBOOT2 = 0.864, Rext2 = 0.837, Qext2 = 0.811, and CCC = 0.891) is recommended as the best model for predicting Kca due to its superior performance. Moreover, interpreting the GBDT-1 and GBDT-2 models based on the Shapley additive explanations (SHAP) method elucidated how molecular properties, such as molecular size, polarizability, and molecular complexity, affected the capacity of plant cuticles to adsorb organic pollutants in the air. The satisfactory performance of the developed models suggests that they have the potential for extensive applications in guiding the environmental fate of organic pollutants and promoting the progress of eco-friendly and sustainable chemical engineering.
Keywords: QSPR; machine learning; organic pollutants; plant cuticle–air partition coefficient.