What Does a Model Really Look at?: Extracting Model-Oriented Concepts for Explaining Deep Neural Networks

IEEE Trans Pattern Anal Mach Intell. 2024 Jul;46(7):4612-4624. doi: 10.1109/TPAMI.2024.3357717. Epub 2024 Jun 5.

Abstract

Model explainability is one of the crucial ingredients for building trustable AI systems, especially in the applications requiring reliability such as automated driving and diagnosis. Many explainability methods have been studied in the literature. Among many others, this article focuses on a research line that tries to visually explain a pre-trained image classification model such as Convolutional Neural Network by discovering concepts learned by the model, which is so-called the concept-based explanation. Previous concept-based explanation methods rely on the human definition of concepts (e.g., the Broden dataset) or semantic segmentation techniques like Slic (Simple Linear Iterative Clustering). However, we argue that the concepts identified by those methods may show image parts which are more in line with a human perspective or cropped by a segmentation method, rather than purely reflect a model's own perspective. We propose Model-Oriented Concept Extraction (MOCE), a novel approach to extracting key concepts based solely on a model itself, thereby being able to capture its unique perspectives which are not affected by any external factors. Experimental results on various pre-trained models confirmed the advantages of extracting concepts by truly representing the model's point of view.