Plasma protein binding (PPB) is a significant pharmacokinetic property of compounds in drug discovery and design. Due to the high cost and time-consuming nature of experimental assays, in silico approaches have been developed to assess the binding profiles of chemicals. However, because of unambiguity and the lack of uniform experimental data, most available predictive models are far from satisfactory. In this study, an elaborately curated training set containing 967 diverse pharmaceuticals with plasma-protein-bound fractions (fb ) was used to construct quantitative structure-activity relationship (QSAR) models by six machine learning algorithms with 26 molecular descriptors. Furthermore, we combined all of the individual learners to yield consensus prediction, marginally improving the accuracy of the consensus model. The model performance was estimated by tenfold cross validation and three external validation sets comprising 242 pharmaceutical, 397 industrial, and 231 newly designed chemicals, respectively. The models showed excellent performance for the entire test set, with mean absolute error (MAE) ranging from 0.126 to 0.178, demonstrating that our models could be used by a chemist when drawing a molecular structure from scratch. Meanwhile, structural descriptors contributing significantly to the predictive power of the models were related to the binding mechanisms, and the trend in terms of their effects on PPB can serve as guidance for the structural modification of chemicals. The applicability domain was also defined to distinguish favorable predictions from unfavorable predictions.
Keywords: QSAR; consensus modeling; machine learning; pharmacokinetics; plasma protein binding.
© 2018 Wiley-VCH Verlag GmbH & Co. KGaA, Weinheim.