Human-centered evaluation of explainable AI applications: a systematic review

Jenia Kim; Henry Maathuis; Danielle Sent

doi:10.3389/frai.2024.1456486

Human-centered evaluation of explainable AI applications: a systematic review

Front Artif Intell. 2024 Oct 17:7:1456486. doi: 10.3389/frai.2024.1456486. eCollection 2024.

Authors

Jenia Kim^#¹, Henry Maathuis^#^{1

2}, Danielle Sent^{1

2}

Affiliations

¹ HU University of Applied Sciences Utrecht, Research Group Artificial Intelligence, Utrecht, Netherlands.
² Jheronimus Academy of Data Science, Tilburg University, Eindhoven University of Technology, 's-Hertogenbosch, Netherlands.

^# Contributed equally.

Abstract

Explainable Artificial Intelligence (XAI) aims to provide insights into the inner workings and the outputs of AI systems. Recently, there's been growing recognition that explainability is inherently human-centric, tied to how people perceive explanations. Despite this, there is no consensus in the research community on whether user evaluation is crucial in XAI, and if so, what exactly needs to be evaluated and how. This systematic literature review addresses this gap by providing a detailed overview of the current state of affairs in human-centered XAI evaluation. We reviewed 73 papers across various domains where XAI was evaluated with users. These studies assessed what makes an explanation "good" from a user's perspective, i.e., what makes an explanation meaningful to a user of an AI system. We identified 30 components of meaningful explanations that were evaluated in the reviewed papers and categorized them into a taxonomy of human-centered XAI evaluation, based on: (a) the contextualized quality of the explanation, (b) the contribution of the explanation to human-AI interaction, and (c) the contribution of the explanation to human-AI performance. Our analysis also revealed a lack of standardization in the methodologies applied in XAI user studies, with only 19 of the 73 papers applying an evaluation framework used by at least one other study in the sample. These inconsistencies hinder cross-study comparisons and broader insights. Our findings contribute to understanding what makes explanations meaningful to users and how to measure this, guiding the XAI community toward a more unified approach in human-centered explainability.

Keywords: XAI; XAI evaluation; explainable AI; human-AI interaction; human-AI performance; human-centered evaluation; meaningful explanations; systematic review.

Publication types

Systematic Review

Grants and funding

The author(s) declare financial support was received for the research, authorship, and/or publication of this article. This research was conducted as part of the FIN-X project, funded by the Dutch National Organisation for Practice-Oriented Research SIA with file number RAAK.MKB17.003.