kScore: a novel machine learning approach that is not dependent on the data structure of the training set

Scott Oloff; Ingo Muegge

doi:10.1007/s10822-007-9108-0

kScore: a novel machine learning approach that is not dependent on the data structure of the training set

J Comput Aided Mol Des. 2007 Jan-Mar;21(1-3):87-95. doi: 10.1007/s10822-007-9108-0. Epub 2007 Feb 28.

Authors

Scott Oloff¹, Ingo Muegge

Affiliation

¹ Boehringer Ingelheim Pharmaceuticals Inc., 900 Ridgebury Road, P.O. Box 368, Ridgefield, CT 06877-368, USA. [email protected]

PMID: 17333481
DOI: 10.1007/s10822-007-9108-0

Abstract

Currently machine learning approaches used in Quantitative Structure Activity Relationship (QSAR) model generation impose restrictions and/or make assumptions on how the training set descriptors correlate with a target activity. kScore has been developed as the first machine learning approach that does not require the training data to conform to a defined kernel, accommodates uneven data point distributions in the descriptor space, and optimizes the weight of each dimension in the descriptor space in order to identify the descriptors most relevant to the target property. The ability of kScore to adapt to virtually any correlation makes it essential that generalization terms be included to inhibit overtraining. The Structural Risk Minimization principle and the linear epsilon-insensitive loss terms have been added to the kScore optimization function. The resulting kScore algorithm has proven to be quite universal across several datasets and either produces results similar to or outperforms the most predictive machine learning algorithms tested, such as SVM, kNN, Recursive Partitioning, Neural Networks, Gaussian Process, and the Bayesian Classifier.

Publication types

Comparative Study

MeSH terms

Artificial Intelligence*
Drug Design*
Models, Theoretical
Quantitative Structure-Activity Relationship