In Silico Prediction of Blood-Brain Barrier Permeability of Compounds by Machine Learning and Resampling Methods

ChemMedChem. 2018 Oct 22;13(20):2189-2201. doi: 10.1002/cmdc.201800533. Epub 2018 Sep 21.

Abstract

The blood-brain barrier (BBB) as a part of absorption protects the central nervous system by separating the brain tissue from the bloodstream. In recent years, BBB permeability has become a critical issue in chemical ADMET prediction, but almost all models were built using imbalanced data sets, which caused a high false-positive rate. Therefore, we tried to solve the problem of biased data sets and built a reliable classification model with 2358 compounds. Machine learning and resampling methods were used simultaneously for the refinement of models with both 2 D molecular descriptors and molecular fingerprints to represent the chemicals. Through a series of evaluation, we realized that resampling methods such as Synthetic Minority Oversampling Technique (SMOTE) and SMOTE+edited nearest neighbor could effectively solve the problem of imbalanced data sets and that MACCS fingerprint combined with support vector machine performed the best. After the final construction of a consensus model, the overall accuracy rate was increased to 0.966 for the final external data set. Also, the accuracy rate of the model for the test set was 0.919, with an excellent balanced capacity of 0.925 (sensitivity) to predict BBB-positive compounds and of 0.899 (specificity) to predict BBB-negative compounds. Compared with other BBB classification models, our models reduced the rate of false positives and were more robust in prediction of BBB-positive as well as BBB-negative compounds, which would be quite helpful in early drug discovery.

Keywords: QSAR models; blood-brain barrier; imbalanced data; machine learning; resampling methods.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms
  • Blood-Brain Barrier / metabolism*
  • Computer Simulation*
  • Databases, Chemical / statistics & numerical data*
  • Models, Chemical
  • Organic Chemicals / chemistry
  • Organic Chemicals / pharmacokinetics*
  • Permeability
  • Support Vector Machine*

Substances

  • Organic Chemicals