Kernel-based partial least squares: application to fingerprint-based QSAR with model visualization

J Chem Inf Model. 2013 Sep 23;53(9):2312-21. doi: 10.1021/ci400250c. Epub 2013 Aug 19.

Abstract

Numerous regression-based and machine learning techniques are available for the development of linear and nonlinear QSAR models that can accurately predict biological endpoints. Such tools can be quite powerful in the hands of an experienced modeler, but too frequently a disconnect remains between the modeler and project chemist because the resulting QSAR models are effectively black boxes. As a result, learning methods that yield models that can be visualized in the context of chemical structures are in high demand. In this work, we combine direct kernel-based PLS with Canvas 2D fingerprints to arrive at predictive QSAR models that can be projected onto the atoms of a chemical structure, allowing immediate identification of favorable and unfavorable characteristics. The method is validated using binding affinities for ligands from 10 different protein targets covering 7 distinct protein families. Models with significant predictive ability (test set Q(2) > 0.5) are obtained for 6 of 10 data sets, and fingerprints are shown to consistently outperform large collections of classical physicochemical and topological descriptors. In addition, we demonstrate how a simple bootstrapping technique may be employed to obtain uncertainties that provide meaningful estimates of prediction accuracy.

MeSH terms

  • Computational Biology / methods*
  • Computer Graphics
  • Endpoint Determination
  • Least-Squares Analysis
  • Quantitative Structure-Activity Relationship*
  • Uncertainty