Multiple Kernel Learning for Drug Discovery

Nicholas C V Pilkington; Matthew W B Trotter; Sean B Holden

doi:10.1002/minf.201100146

Multiple Kernel Learning for Drug Discovery

Mol Inform. 2012 Apr;31(3-4):313-22. doi: 10.1002/minf.201100146. Epub 2012 Apr 4.

Authors

Nicholas C V Pilkington¹, Matthew W B Trotter^{2

3}, Sean B Holden⁴

Affiliations

¹ University of Cambridge Computer Laboratory, 15 JJ Thomson Avenue, Cambridge, CB3 0FD, UK phone: +44 (0)1223 763725.
² Anne McLaren Laboratory for Regenerative Medicine & Department of Surgery, University of Cambridge, UK.
³ Celgene Institute for Translational Research Europe (CITRE), Sevilla, Spain.
⁴ University of Cambridge Computer Laboratory, 15 JJ Thomson Avenue, Cambridge, CB3 0FD, UK phone: +44 (0)1223 763725. [email protected].

PMID: 27477100
DOI: 10.1002/minf.201100146

Abstract

The support vector machine (SVM) methodology has become a popular and well-used component of present chemometric analysis. We assess a relatively recent development of the algorithm, multiple kernel learning (MKL), on published structure-property relationship (SPR) data. The MKL algorithm learns a weighting across multiple kernel-based representations of the data during supervised classifier creation and, thereby, may be used to describe the influence of distinct groups of structural descriptors upon a single structure-property classifier without explicitly omitting any of them. We observe a statistically significant performance improvement over a conventional, single kernel SVM on all three SPR data sets analysed. Furthermore, MKL output is observed to provide useful information regarding the relative influence of five distinct descriptor subsets present in each data set.

Keywords: Chemoinformatics; Drug discovery; Kernel methods; Machine learning; Structure-property relationships.