CFSBoost: Cumulative feature subspace boosting for drug-target interaction prediction

J Theor Biol. 2019 Mar 7:464:1-8. doi: 10.1016/j.jtbi.2018.12.024. Epub 2018 Dec 19.

Abstract

Drug target interaction prediction is a very labor-intensive and expensive experimental process which has motivated researchers to focus on in silico prediction to provide information on potential interaction. In recent years, researchers have proposed several computational approaches for predicting new drug target interactions. In this paper, we present CFSBoost, a simple and computationally cheap ensemble boosting classification model for identification and prediction of drug-target interactions using evolutionary and structural features. CFSBoost uses a simple yet novel feature group selection procedure which allows the model to be computationally very cheap while being able to achieve state of the art performance. The ensemble model uses extra tree as weak learners inside a boosting scheme while holding on to the best model per iteration. We tested our method of four benchmark datasets, which are also referred as gold standard datasets. Our method was able to achieve better score in terms of area under receiver operating characteristic (auROC) curve on 2 out of the 4 datasets. It was also able to achieve higher area under precision recall (auPR) curve on 3 out of the 4 datasets. It has been argued by researchers that auPR metric is more suitable than auROC for comparison of performance on imbalanced datasets such our benchmark datasets. Our reported result shows that, despite of its simplicity in design, CFSBoost's performance is very satisfactory comparing to other literatures. We also provide 5 new possible interactions for each dataset based on CFSBoost's prediction score.

Keywords: Boosting; Class imbalance; Classification; Drug-target; Ensemble classifier; Feature grouping.

MeSH terms

  • Algorithms*
  • Computational Biology*
  • Computer Simulation*
  • Drug Discovery*
  • Humans
  • Models, Chemical*