Explaining Support Vector Machines: A Color Based Nomogram

Vanya Van Belle; Ben Van Calster; Sabine Van Huffel; Johan A K Suykens; Paulo Lisboa

doi:10.1371/journal.pone.0164568

Explaining Support Vector Machines: A Color Based Nomogram

PLoS One. 2016 Oct 10;11(10):e0164568. doi: 10.1371/journal.pone.0164568. eCollection 2016.

Authors

Vanya Van Belle^{1

2}, Ben Van Calster³, Sabine Van Huffel^{1

2}, Johan A K Suykens^{1

2}, Paulo Lisboa⁴

Affiliations

¹ Department of Electrical Engineering (ESAT), STADIUS Center for Dynamical Systems, Signal Processing and Data Analytics, KU Leuven, Leuven, Belgium.
² iMinds Medical IT, Leuven, Belgium.
³ Department of Development and Regeneration, KU Leuven, Leuven, Belgium.
⁴ Department of Applied Mathematics, Liverpool John Moores University, Liverpool, United Kingdom.

Abstract

Problem setting: Support vector machines (SVMs) are very popular tools for classification, regression and other problems. Due to the large choice of kernels they can be applied with, a large variety of data can be analysed using these tools. Machine learning thanks its popularity to the good performance of the resulting models. However, interpreting the models is far from obvious, especially when non-linear kernels are used. Hence, the methods are used as black boxes. As a consequence, the use of SVMs is less supported in areas where interpretability is important and where people are held responsible for the decisions made by models.

Objective: In this work, we investigate whether SVMs using linear, polynomial and RBF kernels can be explained such that interpretations for model-based decisions can be provided. We further indicate when SVMs can be explained and in which situations interpretation of SVMs is (hitherto) not possible. Here, explainability is defined as the ability to produce the final decision based on a sum of contributions which depend on one single or at most two input variables.

Results: Our experiments on simulated and real-life data show that explainability of an SVM depends on the chosen parameter values (degree of polynomial kernel, width of RBF kernel and regularization constant). When several combinations of parameter values yield the same cross-validation performance, combinations with a lower polynomial degree or a larger kernel width have a higher chance of being explainable.

Conclusions: This work summarizes SVM classifiers obtained with linear, polynomial and RBF kernels in a single plot. Linear and polynomial kernels up to the second degree are represented exactly. For other kernels an indication of the reliability of the approximation is presented. The complete methodology is available as an R package and two apps and a movie are provided to illustrate the possibilities offered by the method.

MeSH terms

Color
Nomograms*
Support Vector Machine*

Grants and funding

V. Van Belle is a postdoctoral fellow of the Research foundation Flanders (FWO). This research was supported by: Center of Excellence (CoE): PFV/10/002 (OPTEC); iMinds Medical Information Technologies; Belgian Federal Science Policy Office: IUAP P7/19/ (DYSCO, ‘Dynamical systems, control and optimization’, 2012–2017); European Research Council: ERC Advanced Grant, (339804) BIOTENSORS. This paper reflects only the authors’ views and the Union is not liable for any use that may be made of the contained information. JS acknowledges support of ERC AdG A-DATADRIVE-B, FWO G.0377.12, G.088114N. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.