kScore: a novel machine learning approach that is not dependent on the data structure of the training set

J Comput Aided Mol Des. 2007 Jan-Mar;21(1-3):87-95. doi: 10.1007/s10822-007-9108-0. Epub 2007 Feb 28.

Abstract

Currently machine learning approaches used in Quantitative Structure Activity Relationship (QSAR) model generation impose restrictions and/or make assumptions on how the training set descriptors correlate with a target activity. kScore has been developed as the first machine learning approach that does not require the training data to conform to a defined kernel, accommodates uneven data point distributions in the descriptor space, and optimizes the weight of each dimension in the descriptor space in order to identify the descriptors most relevant to the target property. The ability of kScore to adapt to virtually any correlation makes it essential that generalization terms be included to inhibit overtraining. The Structural Risk Minimization principle and the linear epsilon-insensitive loss terms have been added to the kScore optimization function. The resulting kScore algorithm has proven to be quite universal across several datasets and either produces results similar to or outperforms the most predictive machine learning algorithms tested, such as SVM, kNN, Recursive Partitioning, Neural Networks, Gaussian Process, and the Bayesian Classifier.

Publication types

  • Comparative Study

MeSH terms

  • Artificial Intelligence*
  • Drug Design*
  • Models, Theoretical
  • Quantitative Structure-Activity Relationship