Shapley variable importance cloud for interpretable machine learning

Yilin Ning; Marcus Eng Hock Ong; Bibhas Chakraborty; Benjamin Alan Goldstein; Daniel Shu Wei Ting; Roger Vaughan; Nan Liu

doi:10.1016/j.patter.2022.100452

Shapley variable importance cloud for interpretable machine learning

Patterns (N Y). 2022 Feb 22;3(4):100452. doi: 10.1016/j.patter.2022.100452. eCollection 2022 Apr 8.

Authors

Yilin Ning¹, Marcus Eng Hock Ong^{2

3

4}, Bibhas Chakraborty^{1

2

5

6}, Benjamin Alan Goldstein^{2

6}, Daniel Shu Wei Ting^{1

7

8}, Roger Vaughan^{1

2}, Nan Liu^{1

2

3

8

9}

Affiliations

¹ Centre for Quantitative Medicine, Duke-NUS Medical School, 8 College Road, Singapore 169857, Singapore.
² Programme in Health Services and Systems Research, Duke-NUS Medical School, 8 College Road, Singapore 169857, Singapore.
³ Health Services Research Centre, Singapore Health Services, 20 College Road, Singapore 169856, Singapore.
⁴ Department of Emergency Medicine, Singapore General Hospital, 1 Hospital Crescent Outram Road, Singapore 169608, Singapore.
⁵ Department of Statistics and Data Science, National University of Singapore, 6 Science Drive 2, Singapore 117546, Singapore.
⁶ Department of Biostatistics and Bioinformatics, Duke University, 2424 Erwin Road, Durham, NC 27710, USA.
⁷ Singapore Eye Research Institute, Singapore National Eye Centre, 11 Third Hospital Avenue, Singapore 168751, Singapore.
⁸ SingHealth AI Health Program, Singapore Health Services, 10 Hospital Boulevard, Singapore 168582, Singapore.
⁹ Institute of Data Science, National University of Singapore, 3 Research Link, Singapore 117602, Singapore.

Abstract

Interpretable machine learning has been focusing on explaining final models that optimize performance. The state-of-the-art Shapley additive explanations (SHAP) locally explains the variable impact on individual predictions and has recently been extended to provide global assessments across the dataset. Our work further extends "global" assessments to a set of models that are "good enough" and are practically as relevant as the final model to a prediction task. The resulting Shapley variable importance cloud consists of Shapley-based importance measures from each good model and pools information across models to provide an overall importance measure, with uncertainty explicitly quantified to support formal statistical inference. We developed visualizations to highlight the uncertainty and to illustrate its implications to practical inference. Building on a common theoretical basis, our method seamlessly complements the widely adopted SHAP assessments of a single final model to avoid biased inference, which we demonstrate in two experiments using recidivism prediction data and clinical data.

Keywords: Shapley value; explainable artificial intelligence; explainable machine learning; interpretable machine learning; variable importance cloud.