\eMail

[1][email protected] \eMail[2][email protected] \eMail[3][email protected] \eMail[4][email protected]

\myThanks

[s]Corresponding author

Sparse Explanations of Neural Networks Using Pruned Layer-Wise Relevance Propagation

Paulo Yanez Sarmiento Simon Witzke Nadja Klein Bernhard Y. Renard Hasso Plattner Institute, Digital Engineering Faculty, University of Potsdam, Germany Technische Universität Dortmund, Research Center Trustworthy Data Science and Security, Germany

Abstract

Explainability is a key component in many applications involving deep neural networks (DNNs). However, current explanation methods for DNNs commonly leave it to the human observer to distinguish relevant explanations from spurious noise. This is not feasible anymore when going from easily human-accessible data such as images to more complex data such as genome sequences. To facilitate the accessibility of DNN outputs from such complex data and to increase explainability, we present a modification of the widely used explanation method layer-wise relevance propagation. Our approach enforces sparsity directly by pruning the relevance propagation for the different layers. Thereby, we achieve sparser relevance attributions for the input features as well as for the intermediate layers. As the relevance propagation is input-specific, we aim to prune the relevance propagation rather than the underlying model architecture. This allows to prune different neurons for different inputs and hence, might be more appropriate to the local nature of explanation methods. To demonstrate the efficacy of our method, we evaluate it on two types of data, images and genomic sequences. We show that our modification indeed leads to noise reduction and concentrates relevance on the most important features compared to the baseline.

\keyWords

deep learning; explainable AI (XAI); genomics; pruning; sparsity

\acknowledgement

We gratefully acknowledge funding by grants KL 3037/7-1 (to NK) and RE 3474/8-1 (to BYR), project P5 in the Research Unit KI-FOR 5363 of the German Research Foundation (DFG).

1 Introduction

As the usage of deep neural networks (DNNs) has grown tremendously in a variety of fields, so has the interest in the explainability and interpretability of such models. Especially for sensitive areas such as medical imaging (Litjens et al.,, 2017) or genomics (Eraslan et al.,, 2019, Bartoszewicz et al.,, 2021), an understanding of how the model came to certain predictions is essential to build and ensure trustworthiness. Therefore, so-called post-hoc attribution or explanation methods have been introduced (Samek et al.,, 2021) to decode the black-box behavior common for DNN architectures. Thereby, we assume there is an already trained model and consider a separate method to explain it. Instead of global explanations of the model, our focus is on local methods. Their general idea is to obtain an input-specific explanation of the decisive behavior of the model by attributing relevance scores to every input dimension based on the model’s prediction. It has been shown that several of these methods are special cases of the more general game-theoretic concept of Shapley values (Lundberg and Lee,, 2017). Layer-wise relevance propagation (LRP; Bach et al.,, 2015) is among these methods and propagates relevance backward through the network from one layer to another. The basic LRP approach was subsequently developed and extended to further network architectures (Binder et al.,, 2016, Arras et al.,, 2017, Schnake et al.,, 2021, Ali et al.,, 2022, Montavon et al.,, 2019). It has gained increased popularity since then.

Refer to caption — Figure 1: Illustration of sparsified explanations using our pruned layer-wise relevance propagation (PLRP): (a) Heatmaps of explanations for image classifications obtained through LRP (left) and their corresponding sparsified versions through PLRP (right). These sparser relevance attributions are more distinct explanations of the images and can help to identify and interpret the most important features. (b) Corresponding (exemplary) DNNs show how neurons with low relevance are removed, leading to sparser relevance attribution in every layer and fewer paths through which relevance is propagated. These sparser relevance attributions in the intermediate layers potentially allow to better understand latent factors of the model.

For image classification, a human evaluation and interpretation of the LRP explanation can be done qualitatively by a suitable visualization such as heat-maps. The observer can identify the object to be detected and decide whether the corresponding relevance attribution split across the input dimensions matches this object. However, for high-dimensional data with unknown ground truth, this might not be feasible anymore. For example, this is often the case for genome sequences. Besides the high-dimensionality, noisy relevance attributions can make it difficult to identify and interpret the main drivers for the model’s prediction in such data and hence, diminish the meaningfulness of the explanation. As a solution, we propose a modification of LRP that generates sparser explanations by extending the idea of filtering or pruning the relevance propagation (Montavon et al.,, 2018). We call our approach pruned layer-wise relevance propagation (PLRP). Sparsification of the explanation might be desirable in the sense that it reduces noise and the number of features with non-zero relevance, i.e., highlights only the most important features. Furthermore, our method is able to regulate the degree of sparsity, which can be beneficial for feature selection or the identification of concepts learned by the network in deeper layers. The basic intuition of our method is illustrated in Figure 1(a) via three heatmaps of images obtained through LRP (left) and their corresponding sparsified versions through PLRP (right).

Further, Figure 1(b) shows how the relevance propagation in our proposed method PLRP is pruned within a DNN by removing less relevant neurons. This idea of maintaining only the most important features is inspired by $L1$ -regularization. However, integrating $L1$ -regularization into LRP is not directly applicable. This is because the map representing the relevance propagation from one layer to another, defined by LRP, is input-specific. Hence, there is no single consistent map for all data points. Approximating all inputs with the same model would be a strong simplification of the original explanation method. It would also disregard the fact that different inputs might activate different neurons in intermediate layers. Altogether, this motivates our contribution of PLRP.

2 Related Work

Since LRP was introduced by Bach et al., (2015), several modifications of it have been proposed. Some work focuses on creating class-discriminative explanations by not only creating relevance attributions for the target class. Gu et al., (2019) and Iwana et al., (2019) propose modifications where, also for non-target classes, the negative prediction score is propagated backward. Hence, the final explanation is the difference between target and non-target classes. Similarly, Montavon et al., (2019) point out that instead of the prediction score, one could also propagate the log-probability ratios to achieve class-discriminative explanations. However, Jung et al., (2021) show that some of these modifications exhibit a so-called erasing object problem. This means when considering differences in explanations, some relevance scores might cancel each other out and thereby potentially falsely erase relevant features from the explanation. To overcome this issue, they introduce a modification of LRP by constraining the relevance propagation to neurons with positive gradient. Hence, when propagating backward, the relevance of some neurons is set to zero. Montavon et al., (2018) also present this idea of filtering the propagation through certain neurons. This is also used by Achtibat et al., (2022) to create more human-understandable explanations, assuming there is some understanding of the representation of data in the latent model space, i.e., the roles of the learned latent factors. They first identify these factors globally for the model and then restrict the propagation of relevance through them locally per input sample. Chormai et al., (2022) propose a numerical approach to identify relevant subspaces of the latent space for models with unknown roles of latent factors. These subspaces allow to create disentangled explanations that decompose the explanation into different components, which can then be visualized and interpreted.

Although related to model pruning (Zhu and Gupta,, 2017), our approach and goal of pruning relevance propagation conceptually differ from it. Instead of pruning the underlying model, we aim to sparsify the individual explanations, which are local approximations of the model. Hence, we prune different neurons depending on the input features, thereby enabling a local and input-specific pruning. In contrast, when Gupta et al., (2021) and Yeom et al., (2021) applied LRP for model pruning, they used the relevance attribution aggregated over multiple inputs to prune the model globally. The authors show that the number of parameters can be reduced drastically, while still maintaining or even improving the predictive performance of the model.

3 Pruned Layer-Wise Relevance Propagation (PLRP)

We propose to extend the idea of filtering the propagation of relevance through certain neurons (Montavon et al.,, 2018) by using the relevance itself as filter criterion. To the best of our knowledge, none of the previous works focused on creating sparse explanations in the sense of locally reducing the input features and latent factors to the most important ones and concentrating relevance mass there. As the explanations are input-specific, a local pruning of relevance propagation is more appropriate for our goal than a global pruning of the model. This motivates our pruned layer-wise relevance propagation (PLRP).

3.1 Setting

Let $f:\mathbb{R}^{d}\rightarrow\mathbb{R}^{C},\boldsymbol{x}=(x_{1},\ldots,x_{d})^% {\top}\mapsto f(\boldsymbol{x})=(f_{1}(\boldsymbol{x}),\ldots,f_{C}(% \boldsymbol{x}))^{\top}$ be the pre-softmax prediction function of a trained DNN for a classification task with $C$ classes, which we also refer to as model. For our setting, we are only interested in the prediction score of the winning class, denoted by $f_{c^{*}}(\boldsymbol{x})$ . For a given model, an explanation method is a function $e:\mathbb{R}^{d}\rightarrow\mathbb{R}^{d}:\boldsymbol{x}\mapsto e(\boldsymbol{% x})=(r_{1},...,r_{d})^{\top}$ that assigns a relevance score $r_{i}$ to every input dimension $i=1,\ldots,d$ . We denote by $W_{l}\in\mathbb{R}^{m\times n}$ the weight matrix of the $l$ -th layer of the model and by $w_{jk}^{(l)}$ the weight of the edge connecting the $j$ -th neuron in layer $l-1$ with the $k$ -th neuron in layer $l$ . The corresponding activations are denoted by $a_{j}^{(l-1)}$ and $a_{k}^{(l)}$ , respectively. Hence, we can write $\boldsymbol{a}^{(l)}=\big{(}W^{\top}_{l}\boldsymbol{a}^{(l-1)}\big{)}^{+}$ , where $(\cdot)^{+}$ refers to the entry-wise application of the ReLU function. Since for our purposes we consider only one layer at a time, we will drop the layer-specific superscripts for readability and simply write $w_{jk}$ . Furthermore, let $(\cdot)^{-}$ be the function that only keeps the absolute value of the negative part and is zero otherwise.

Finally, for the relevance attribution, we refer to the relevance of the $k$ -th neuron in layer $l$ as $r_{k}^{(l)}$ and write $\boldsymbol{r}^{(l)}=(r_{1}^{(l)},\ldots,r_{n}^{(l)})^{\top}$ , in particular $\boldsymbol{r}^{(0)}=\boldsymbol{r}=e(\boldsymbol{x})$ . For the last layer $L$ by definition, we have $r_{c^{*}}^{(L)}:=f_{c^{*}}(\boldsymbol{x})$ and $r_{i}^{(L)}:=0,i\neq c^{*}$ .

As described earlier, LRP propagates the relevance backward from one layer to its predecessor, starting with the prediction score of the output layer $f_{c^{*}}(\boldsymbol{x})$ . The total relevance mass is preserved in every layer by the so-called conservation property (Bach et al.,, 2015). The relevance propagation from layer $l$ to $l-1$ can be expressed by a linear map (Sixt et al.,, 2020), i.e., a multiplication with the matrix $M_{l}:\mathbb{R}^{n}\rightarrow\mathbb{R}^{m}:\boldsymbol{r}^{(l)}\mapsto% \boldsymbol{r}^{(l-1)}=M_{l}\boldsymbol{r}^{(l)}$ . Thereby, the matrix entry in the $i$ -th row and $j$ -th column represents the proportion of relevance of neuron $j$ in layer $l$ that is allocated to neuron $i$ in layer $l-1$ . As for the weights, we will disregard the layer subscript and only write $M$ . There are different LRP rules for how these proportions are calculated (Montavon et al.,, 2018). Specifically, for the LRP-0 rule, the matrix $M$ is given by

\displaystyle M=\begin{bmatrix}\frac{a_{j}w_{jk}}{\sum_{j=1}^{m}a_{j}w_{jk}}% \end{bmatrix}_{j=1,...,m,k=1,...,n}\,.

(3.1)

Note that by definition, the columns of $M$ are normalized corresponding to redistributing the whole relevance of a neuron. For choosing a rule that yields measurably appropriate relevance attributions, we follow the composite strategy of Kohlbrenner et al., (2020) that combines various rules for different layer types.

3.2 Pruned LRP

The goal of pruned layer-wise relevance propagation (PLRP) is to create sparser explanations by reducing noise and simultaneously increasing the attribution to the most relevant features. We do this by pruning the explanation directly in every layer. This consists of two steps: first, determining the neurons to be pruned, and second, redistributing their relevance mass among the remaining neurons. The criterion for pruning the neurons is the relevance itself, i.e., the neurons with the lowest relevance scores get pruned. Hence, first, we conduct a preliminary unpruned relevance attribution to layer $l$ , prune it, and then after redistribution, use it as input for the attribution to layer $l-1$ . As it is unclear in advance how the relevance is distributed among the neurons, we prune a proportion of the total relevance mass instead of a proportion of the neurons.

By redistributing the pruned relevance mass among the remaining neurons, we maintain the relevance conservation of LRP. As we are potentially interested in strong negative contributions, we consider the positive and negative relevance separately, i.e., we aim to prune neurons with small relevance in absolute terms. Hence, in the following by $\boldsymbol{r}^{(l)}$ we either mean $\big{(}\boldsymbol{r}^{(l)}\big{)}^{+}$ oder $\big{(}\boldsymbol{r}^{(l)}\big{)}^{-}$ . Note that the proportion of pruned mass can be different for the positive and negative parts of the relevance vector. We do not prune the relevance propagation in the last step, i.e., the first layer is left unpruned. This would correspond to a simple thresholding of the input because it is not further propagated through further layers and thus is neither desired nor meaningful.

3.2.1 Pruning Relevance

We denote the unpruned relevance vector by $\widetilde{\boldsymbol{r}}^{(l-1)}$ . Then for a pre-defined proportion $p_{l}\in[0,1]$ of the total relevance mass that is supposed to be pruned, we determine the corresponding threshold $\theta_{l}\in\mathbb{R}$ , such that the relevance scores below $\theta_{l}$ sum up to $p_{l}$ of the total relevance mass. Hence, for (unpruned) ordered relevance scores $\widetilde{r}_{s_{1}}^{(l-1)}\leq\widetilde{r}_{s_{2}}^{(l-1)}\leq\ldots\leq% \widetilde{r}_{s_{n}}^{(l-1)}$ , we set

\displaystyle\theta_{l}:=\max_{1\leq i^{*}\leq n}\Bigg{\{}\widetilde{r}_{s_{i^% {*}}}^{(l-1)}\,\Bigg{|}\,\sum_{i=1}^{i^{*}}\widetilde{r}_{s_{i}}^{(l-1)}\leq p% _{l}\sum_{i=1}^{n}\widetilde{r}_{i}^{(l-1)}\Bigg{\}}\,.

(3.2)

If $p_{l}$ leads to an empty set in (3.2), we set $\theta_{l}:=0$ . Hence, we obtain the desired relevance pruning as

\displaystyle\left\|\boldsymbol{1}_{\left\{\widetilde{\boldsymbol{r}}^{(l-1)}>% \theta_{l}\right\}}\odot\widetilde{\boldsymbol{r}}^{(l-1)}\right\|_{1}\geq(1-p% _{l})\left\|\widetilde{\boldsymbol{r}}^{(l-1)}\right\|_{1}\,,

where $\odot$ denotes the element-wise multiplication and $\boldsymbol{1}_{\{\,\boldsymbol{\cdot}\,>\theta_{l}\}}$ is the element-wise application of the indicator function. Hence, we only keep the entries where the neuron’s relevance lies above the threshold $\theta_{l}$ . Every $p_{l}$ can be seen as a hyperparameter that could be tuned. However, for dense networks, this can easily become computationally very expensive. Further, as the relevance attribution is input-specific, the optimal parameter might differ significantly for different samples. Therefore, we consider an additional approach that determines $\theta_{l}$ and hence, $p_{l}$ after the relevance propagation in every layer. As we are interested in sparse relevance vectors, we consider the so-called sparsity gain. Thereby, the additional degree of sparsity with respect to pruned relevance is calculated. The idea is that neurons with small relevance yield a high gain, while the gain decreases for neurons that concentrate mass as more mass must be pruned for an additional degree of sparsity. Since the total relevance may differ for different inputs, we consider the relative pruning, i.e., the proportion $p_{l}$ . Therefore, consider a real-valued vector $v_{0}$ . We iteratively sparsify $v_{0}$ by setting some entries to zero. Let $v^{\prime}$ denote the vector after setting some entries in $v_{0}$ to zero and $v^{\prime\prime}$ the vector after further entries in $v^{\prime}$ were set to zero. Then for some sparsity measure $s$ that increases with respect to the degree of sparsity, the sparsity gain for setting $v^{\prime}$ to $v^{\prime\prime}$ is defined as

\displaystyle\Delta_{SG}:=\frac{\Delta s}{\Delta p_{l}}=\|v_{0}\|_{1}\frac{s(v% ^{\prime\prime})-s(v^{\prime})}{\|v^{\prime\prime}\|_{1}-\|v^{\prime}\|_{1}}\,.

(3.3)

When setting entries in $v_{0}$ iteratively to zero based on their increasing value, then the sparsity gain is monotonically decreasing. Hence, a minimal allowed sparsity gain $\Delta_{SG}$ determines a proportion of pruned mass $p_{l}$ and thus a threshold $\theta_{l}$ . Therefore, the approach of sparsity gain makes PLRP independent of choosing fixed proportions $p_{l}$ in advance. Instead, every $p_{l}$ is determined during the process of pruning. This procedure is related to considering a minimal or maximal change of a function, i.e., its derivative. Nevertheless, the choice of a minimal sparsity gain is necessary. A natural choice would be 1, i.e., when an additional degree of sparsity becomes more expensive than one unit of pruned mass.

3.2.2 Redistributing Pruned Relevance

The pruned relevance is redistributed among the remaining neurons with non-zero relevance. We consider two possible approaches. The first one is based on the relative proportion of the neuron’s relevance after pruning. This is equivalent to a rescaling by a factor $\lambda$ , such that the total relevance mass sums up again to the value before pruning. We refer to this procedure as PLRP- $\lambda$ . For this, define

\displaystyle\boldsymbol{r}^{(l-1)}:=\lambda_{l}\,\boldsymbol{1}_{\big{\{}% \widetilde{\boldsymbol{r}}^{(l-1)}>\theta_{l}\big{\}}}\widetilde{\boldsymbol{r% }}^{(l-1)}\,,

with

\displaystyle\lambda_{l}=\left\|\widetilde{\boldsymbol{r}}^{(l-1)}\right\|_{1}% \,\Big{/}\,\left\|\boldsymbol{1}_{\left\{\widetilde{\boldsymbol{r}}^{(l-1)}>% \theta_{l}\right\}}\odot\widetilde{\boldsymbol{r}}^{(l-1)}\right\|_{1}\approx% \frac{1}{1-p_{l}}\,.

For the second approach, denoted by PLRP- $M$ , we make use of the fact that LRP assigns zero relevance to neurons with zero activation, i.e., $a_{j}^{(l)}=0\Rightarrow r_{j}^{(l)}=0$ . Therefore, (re-)applying LRP with more zero activations leads to sparser relevance attribution without the necessity of rescaling the relevance mass. We do this by modifying the matrix $M$ expressing the relevance propagation from Equation (3.1). Given the unpruned relevance vector $\widetilde{\boldsymbol{r}}^{(l-1)}$ , we define the modified activations by

\displaystyle\widehat{\boldsymbol{a}}^{(l-1)}

\displaystyle:=\boldsymbol{1}_{\left\{\widetilde{\boldsymbol{r}}^{(l-1)}>% \theta_{l}\right\}}\odot\boldsymbol{a}^{(l-1)}

(3.4)

and hence, the modified relevance attribution as

	$\displaystyle\widehat{M}_{l}$	$\displaystyle:=\begin{bmatrix}\frac{\widehat{a}_{j}^{(l-1)}w_{jk}^{(l)}}{\sum_% {j=1}^{m}\widehat{a}_{j}^{(l-1)}w_{jk}^{(l)}}\end{bmatrix}_{j=1,...,m,k=1,% \ldots,n}$		(3.5)
	$\displaystyle\boldsymbol{r}^{(l-1)}$	$\displaystyle:=\widehat{M}_{l}\boldsymbol{r}^{(l)}\,.$		(3.6)

Hence, by attributing relevance according to $\widehat{M}_{l}$ , we ensure that neurons receive zero relevance that would have been assigned a relevance mass below the threshold $\theta_{l}$ of Equation 3.2 otherwise. As $\widehat{M}_{l}$ is also a matrix in the form of Equation 3.1, only with more zero activations, the pruned relevance gets implicitly redistributed in the manner of LRP.

4 Evaluation

We evaluate the two variants of PLRP in the context of classification of two different data types: images and genomic sequences. For the quantitative evaluation, we use multiple metrics covering different aspects of the explanation method following the categorization of Hedström et al., (2023). Further, we evaluate our methods qualitatively by considering heatmaps and sequence logo plots of the relevance attributions.

4.1 Evaluation Metrics

Sparsity and localization are intended to measure the efficacy of our method, while faithfulness and robustness serve as sanity checks for its trustworthiness.

Sparsity

As we are interested in sparse explanations that concentrate the relevance mass at few important features, we evaluate the degree of sparsity or complexity by the Gini Index and entropy of the relevance attribution as proposed by Chalasani et al., (2020) and Bhatt et al., (2020). Thereby, a higher Gini Index or lower entropy, respectively, indicate that there are more zero entries and more mass is concentrated at fewer entries.

Localization

Localization measures how accurately the relevance is attributed to a pre-specified region. Hence, for this metric, a known ground truth is needed. We therefore calculate the Relevance Mass Accuracy (RMA; Arras et al.,, 2022), which measures how much relevance mass out of the total mass is allocated to the ground truth.

Faithfulness

Faithfulness measures whether the explanation method indeed uncovers the predictive behavior of the underlying model. This means that the highest relevance scores correspond to the most decisive input features. The idea is to perturb the input based on the relevance attribution and then measure how quickly the predictive score drops. This procedure is also referred to as pixel-flipping (Bach et al.,, 2015) oder selectivity (Samek et al.,, 2016). Hence, we have a perturbation function that takes the model input $\boldsymbol{x}$ and corresponding relevance attribution $\boldsymbol{r}=e(\boldsymbol{x})$ and outputs a sequence of perturbed versions of $\boldsymbol{x}$ based on $\boldsymbol{r}$ , i.e., $(\boldsymbol{x},\boldsymbol{r})\mapsto(\boldsymbol{x}_{1}^{\prime},\ldots,% \boldsymbol{x}_{s}^{\prime})$ where $s$ denotes the number of perturbation steps. As the actual value of prediction scores can differ from one class to another, we normalize the scores and measure the relative drop. Further, to compare the results across different pruning parameters $p_{l}$ , we also consider the area under the curve (AUC). Note that a quicker decrease in the prediction score corresponds to a lower AUC and hence, to a better explanation with respect to this metric.

Robustness

For explanation methods, robustness refers to the idea that similar inputs should produce similar explanations. Hence, it can be seen as a sensitivity proxy that measures how much the explanation changes for a small shift in the input. Alvarez-Melis and Jaakkola, (2018) propose to estimate the local Lipschitz constant, i.e., for $\varepsilon>0$ and some input $\boldsymbol{x}\in\mathbb{R}^{d}$ , the aim is to find a constant $L=L(\boldsymbol{x})$ such that for all $\boldsymbol{x}^{\prime}\in B_{\varepsilon}(\boldsymbol{x})=\{\boldsymbol{x}^{% \prime}:\|\boldsymbol{x}-\boldsymbol{x}^{\prime}\|<\varepsilon\}$ it holds $\|e(\boldsymbol{x})-e(\boldsymbol{x}^{\prime})\|<L\|\boldsymbol{x}-\boldsymbol% {x}^{\prime}\|$ . Hence, $L$ is the steepest possible ascent of $e$ in a small neighborhood of $\boldsymbol{x}$ . As calculating the gradient of the explanation method might be very expensive, or it might not even be differentiable, $L$ is estimated numerically by sampling from $B_{\varepsilon}(\boldsymbol{x})$ .

4.2 Experiments

4.2.1 Image Classification

Model Specification

We perform our evaluation on the ImageNet dataset (Russakovsky et al.,, 2015) and the Extended Complex Scene Saliency Dataset (ECSSD) (Shi et al.,, 2015). Thereby, we consider two established model architectures: VGG-16 (Simonyan and Zisserman,, 2015) and ResNet-50 (He et al.,, 2016), both convolutional neural networks (CNN). The ground truth masks of ECCSD are more granular than the bounding boxes of ImageNet, which contain more pixels than the actual ground truth object. Therefore, ECCSD is more suitable for measuring localization. However, for comparison and completeness, we measure sparsity, faithfulness, and robustness also on the widely used ImageNet dataset. Thereby, we use a sample of 3,900 images of its validation set covering all classes. As we must draw multiple samples to calculate the Lipschitz estimate for a single input for robustness analyses, we restrict the number of input samples to 10. The evaluation includes both of our approaches, PLRP- $\lambda$ and PLRP- $M$ , with and without sparsity gain.

Our implementation of PLRP¹¹1Code available at https://anonymous.4open.science/r/plrp-FD34 is based on the Zennit package (Anders et al.,, 2021). For the evaluation, we use the Quantus package (Hedström et al.,, 2023) that implements several evaluation metrics. For evaluating faithfulness, we use a feature of the iNNvestigate package (Alber et al.,, 2019) that perturbs the input $\boldsymbol{x}$ based on its explanation $e(\boldsymbol{x})$ .

Results

We observe that our approach creates sparser explanations, while maintaining the most relevant features. Figure 2 illustrates that the Gini Index increases drastically for all our approaches compared to the LRP baseline. Simultaneously, the RMA increases, indicating that after pruning and redistributing a higher proportion of relevance falls within the ground truth mask. In both metrics, we observe a steep increase already for small proportions ( $<0.2$ ) of pruned relevance $p$ . For both models, VGG16 and ResNet50, the simpler approach PLRP- $\lambda$ with and without sparsity gain leads to better results compared to PLRP- $M$ and the LRP baseline. We also find that the RMA starts decreasing again for higher $p$ . This is not surprising as, at some point, the most decisive features might also get pruned. These findings are in line with what we observe qualitatively by visualization (see Figure 4), i.e., PLRP- $\lambda$ produces sparser explanations than PLRP- $M$ for the same pruning parameter $p$ . For higher $p$ , also relevance within the ground truth mask gets pruned. The entropy shows completely analogous results to the Gini Index and hence, is not displayed here.

With respect to faithfulness, except for one approach (PLRP- $M$ with and without sparsity gain and $p<0.3$ ), we observe slightly worse results than the baseline. Note that better results correspond to a lower AUC as we are interested in a quick drop in the prediction score. However, we observe that the higher AUC is mainly driven by less important features, i.e., we have a less steep decrease in the prediction score for later perturbed and hence, less important features. This is in line with our goal of maintaining only the most important features. In fact, for the features with the highest relevance, the prediction score drops similarly steeply as for the baseline (see Figure 3). Overall, the relative decline in faithfulness is small compared to the relative gain in sparsity and localization, especially for small $p$ .

Finally, PLRP- $\lambda$ is always more robust than the LRP baseline. Only PLRP- $M$ is less robust than the baseline for few parameterizations. Therefore, the explanations of our approach can be seen as reliable and stable.

Overall, PLRP- $\lambda$ is able to increase sparsity quicker and concentrate more relevance within the ground truth mask than PLRP- $M$ , i.e., there is a steeper increase for the same proportion of pruned relevance. It appears that redistributing the pruned relevance in the manner of LRP as done by PLRP- $M$ slows down the concentration of relevance. This is also in line with PLRP- $M$ sticking closer to the baseline with respect to faithfulness. Another driver for the better results of PLRP- $\lambda$ over PLRP- $M$ with respect to sparsity, localization, and robustness appears to be that PLRP- $M$ can lead to a sign flipping of a neuron’s relevance (see negative relevance for PLRP- $M$ in Figure 4(d)). PLRP- $M$ changes the proportions of how much relevance is attributed from one neuron to another (see Equation 3.5). Since this might increase the proportion of incoming negative relevance or decrease it for incoming positive relevance, respectively, the sign of the total relevance a neuron receives might switch. Furthermore, the sign of the proportions themselves might change. PLRP- $\lambda$ , on the other hand, only rescales the relevance by a positive factor and hence, preserves the initial sign of the relevance score.

The quantitative results for the ImageNet dataset for sparsity, faithfulness, and robustness are analogous to the results for ECSSD and can be found in the supplementary material in section A.

4.2.2 Genomics

Model Specification

Next, we apply PLRP to explain predictions of CNNs trained on synthetic genome sequences data with known ground truth (Lemanczyk et al.,, 2024). Specifically, the input consists of strings of length 250 from the alphabet $\{A,C,G,T\}$ representing the four base pairs in DNA. The sequences are one-hot-encoded. Hence, our input domain is $\{0,1\}^{4\times 250}$ . Thereby, we consider two different model architectures: one with only four filters and another one with 32. As we operate on a discrete domain, slight changes in the input or perturbations are not well-defined. Therefore, we only calculate the metrics for sparsity and localization.

Results

Similar to the task of image classification, both PLRP variants produce sparser explanations compared to the LRP baseline. We observe qualitatively (see Figure 5) that noise is reduced and relevance is concentrated at the ground truth patterns in the genome sequence (motifs). Similar to the application to image classification, we find that, in general, the RMA increases for an increasing proportion of pruned relevance $p$ . Hence, the more we prune and redistribute, the more relevance lies within the ground truth mask. Simultaneously, sparsity rises, while complexity is reduced as indicated by an increasing Gini Index.

Figure 5 also shows that this noise reduction indeed differs from simple thresholding, as it maintains relevance scores within the ground truth mask that are smaller than the noise signal at other irrelevant features.

The results of PLRP- $\lambda$ and PLRP- $M$ only differ slightly. This might be due shallower networks used here compared to the image classification. The fewer layers there are, the less both variants can diverge from each other. The quantitative results for the genomics application can be found in the supplementary material in section B.

5 Conclusion

In this work, we introduced PLRP, a modification of LRP that enforces sparsity directly by locally pruning the relevance propagation for the different layers. We presented two approaches to redistribute the pruned relevance mass so that the conservation property is preserved: the simpler PLRP- $\lambda$ and the more complex PLRP- $M$ , which is more in line with original LRP methodology. The evaluation on the ECSSD and ImageNet datasets shows that we indeed obtain sparser explanations than the LRP baseline. The pruning leads to only a slight decrease in faithfulness, mainly driven by the least important features. By measuring localization, we show that while becoming sparser, more relevance is attributed to features within the ground truth mask. This demonstrates the efficacy of PLRP compared to LRP. In fact, both our approaches lead to noise reduction and concentrate relevance at the most important features. Qualitative evaluation by visualization supports this claim. Furthermore, our approach is similar and, depending on the parameterization, even more robust to small changes in the input than the LRP baseline. For the genomics application, we observe similar effects. We obtain sparser explanations with higher concentration of relevance on the most important features in the genome sequence, aiding in the interpretation of the model outputs. Overall, PLRP- $\lambda$ produces better results for our goal of sparsity and relevance concentration than PLRP- $M$ . This is partially driven by the effect of relevance sign flipping. However, it should be investigated whether this effect can be reduced by further modification. Additionally, how to define and find an optimal parameterization for PRLP remain open questions. This might heavily depend on the underlying model but also on the “optimal” degree of sparsity for a specific task. Finally, as PLRP not only prunes the relevance attribution for the input features but also for neurons in intermediate layers, it could be used to study what concepts a model learned in deeper layers.

References

Achtibat et al., (2022) Achtibat, R., Dreyer, M., Eisenbraun, I., Bosse, S., Wiegand, T., Samek, W., and Lapuschkin, S. (2022). From" Where" to" What": Towards Human-Understandable Explanations through Concept Relevance Propagation. arXiv preprint arXiv:2206.03208.
Alber et al., (2019) Alber, M., Lapuschkin, S., Seegerer, P., Hägele, M., Schütt, K. T., Montavon, G., Samek, W., Müller, K.-R., Dähne, S., and Kindermans, P.-J. (2019). iNNvestigate Neural Networks! Journal of Machine Learning Research, 20(93):1–8.
Ali et al., (2022) Ali, A., Schnake, T., Eberle, O., Montavon, G., Müller, K.-R., and Wolf, L. (2022). Xai for transformers: Better explanations through conservative propagation. In International Conference on Machine Learning, pages 435–451. PMLR.
Alvarez-Melis and Jaakkola, (2018) Alvarez-Melis, D. and Jaakkola, T. S. (2018). On the robustness of interpretability methods. arXiv preprint arXiv:1806.08049.
Anders et al., (2021) Anders, C. J., Neumann, D., Samek, W., Müller, K.-R., and Lapuschkin, S. (2021). Software for Dataset-wide XAI: From Local Explanations to Global Insights with Zennit, CoRelAy, and ViRelAy. CoRR, abs/2106.13200.
Arras et al., (2017) Arras, L., Montavon, G., Müller, K.-R., and Samek, W. (2017). Explaining recurrent neural network predictions in sentiment analysis. arXiv preprint arXiv:1706.07206.
Arras et al., (2022) Arras, L., Osman, A., and Samek, W. (2022). CLEVR-XAI: a benchmark dataset for the ground truth evaluation of neural network explanations. Information Fusion, 81:14–40.
Bach et al., (2015) Bach, S., Binder, A., Montavon, G., Klauschen, F., Müller, K.-R., and Samek, W. (2015). On Pixel-Wise Explanations for Non-Linear Classifier Decisions by Layer-Wise Relevance Propagation. PLoS ONE, 10.
Bartoszewicz et al., (2021) Bartoszewicz, J. M., Seidel, A., and Renard, B. Y. (2021). Interpretable detection of novel human viruses from genome sequencing data. NAR Genomics and Bioinformatics, 3(1):lqab004.
Bhatt et al., (2020) Bhatt, U., Weller, A., and Moura, J. M. (2020). Evaluating and aggregating feature-based model explanations. arXiv preprint arXiv:2005.00631.
Binder et al., (2016) Binder, A., Montavon, G., Lapuschkin, S., Müller, K.-R., and Samek, W. (2016). Layer-wise relevance propagation for neural networks with local renormalization layers. In Artificial Neural Networks and Machine Learning–ICANN 2016: 25th International Conference on Artificial Neural Networks, Barcelona, Spain, September 6-9, 2016, Proceedings, Part II 25, pages 63–71. Springer.
Chalasani et al., (2020) Chalasani, P., Chen, J., Chowdhury, A. R., Wu, X., and Jha, S. (2020). Concise explanations of neural networks using adversarial training. In International Conference on Machine Learning, pages 1383–1391. PMLR.
Chormai et al., (2022) Chormai, P., Herrmann, J., Müller, K.-R., and Montavon, G. (2022). Disentangled explanations of neural network predictions by finding relevant subspaces. arXiv preprint arXiv:2212.14855.
Eraslan et al., (2019) Eraslan, G., Avsec, Ž., Gagneur, J., and Theis, F. J. (2019). Deep learning: new computational modelling techniques for genomics. Nature Reviews Genetics, 20(7):389–403.
Gu et al., (2019) Gu, J., Yang, Y., and Tresp, V. (2019). Understanding Individual Decisions of CNNs via Contrastive Backpropagation. In Computer Vision–ACCV 2018: 14th Asian Conference on Computer Vision, Perth, Australia, December 2–6, 2018, Revised Selected Papers, Part III 14, pages 119–134. Springer.
Gupta et al., (2021) Gupta, S., Chan, Y. H., Rajapakse, J. C., Initiative, A. D. N., et al. (2021). Obtaining leaner deep neural networks for decoding brain functional connectome in a single shot. Neurocomputing, 453:326–336.
He et al., (2016) He, K., Zhang, X., Ren, S., and Sun, J. (2016). Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 770–778.
Hedström et al., (2023) Hedström, A., Weber, L., Krakowczyk, D., Bareeva, D., Motzkus, F., Samek, W., Lapuschkin, S., and Höhne, M. M. M. (2023). Quantus: An Explainable AI Toolkit for Responsible Evaluation of Neural Network Explanations and Beyond. Journal of Machine Learning Research, 24(34):1–11.
Iwana et al., (2019) Iwana, B. K., Kuroki, R., and Uchida, S. (2019). Explaining Convolutional Neural Networks Using Softmax Gradient Layer-wise Relevance Propagation. In 2019 IEEE/CVF International Conference on Computer Vision Workshop (ICCVW), pages 4176–4185. IEEE.
Jung et al., (2021) Jung, Y.-J., Han, S.-H., and Choi, H.-J. (2021). Explaining CNN and RNN Using Selective Layer-Wise Relevance Propagation. IEEE Access, 9:18670–18681.
Kohlbrenner et al., (2020) Kohlbrenner, M., Bauer, A., Nakajima, S., Binder, A., Samek, W., and Lapuschkin, S. (2020). Towards best practice in explaining neural network decisions with LRP. In 2020 International Joint Conference on Neural Networks (IJCNN), pages 1–7. IEEE.
Lemanczyk et al., (2024) Lemanczyk, M. S., Bartoszewicz, J. M., and Renard, B. Y. (2024). Motif Interactions Affect Post-Hoc Interpretability of Genomic Convolutional Neural Networks. bioRxiv preprint bioRxiv:2024.02.15.580353.
Litjens et al., (2017) Litjens, G., Kooi, T., Bejnordi, B. E., Setio, A. A. A., Ciompi, F., Ghafoorian, M., Van Der Laak, J. A., Van Ginneken, B., and Sánchez, C. I. (2017). A survey on deep learning in medical image analysis. Medical image analysis, 42:60–88.
Lundberg and Lee, (2017) Lundberg, S. M. and Lee, S.-I. (2017). A Unified Approach to Interpreting Model Predictions. In Advances in Neural Information Processing Systems, volume 30. Curran Associates, Inc.
Montavon et al., (2019) Montavon, G., Binder, A., Lapuschkin, S., Samek, W., and Müller, K.-R. (2019). Layer-Wise Relevance Propagation: An Overview. Explainable AI: interpreting, explaining and visualizing deep learning, pages 193–209.
Montavon et al., (2018) Montavon, G., Samek, W., and Müller, K.-R. (2018). Methods for interpreting and understanding deep neural networks. Digital signal processing, 73:1–15.
Russakovsky et al., (2015) Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., Huang, Z., Karpathy, A., Khosla, A., Bernstein, M., Berg, A. C., and Fei-Fei, L. (2015). ImageNet Large Scale Visual Recognition Challenge. International Journal of Computer Vision (IJCV), 115(3):211–252.
Samek et al., (2016) Samek, W., Binder, A., Montavon, G., Lapuschkin, S., and Müller, K.-R. (2016). Evaluating the visualization of what a deep neural network has learned. IEEE transactions on neural networks and learning systems, 28(11):2660–2673.
Samek et al., (2021) Samek, W., Montavon, G., Lapuschkin, S., Anders, C. J., and Müller, K.-R. (2021). Explaining Deep Neural Networks and Beyond: A Review of Methods and Applications. Proceedings of the IEEE, 109(3):247–278.
Schnake et al., (2021) Schnake, T., Eberle, O., Lederer, J., Nakajima, S., Schütt, K. T., Müller, K.-R., and Montavon, G. (2021). Higher-order explanations of graph neural networks via relevant walks. IEEE transactions on pattern analysis and machine intelligence, 44(11):7581–7596.
Shi et al., (2015) Shi, J., Yan, Q., Xu, L., and Jia, J. (2015). Hierarchical image saliency detection on extended cssd. IEEE transactions on pattern analysis and machine intelligence, 38(4):717–729.
Simonyan and Zisserman, (2015) Simonyan, K. and Zisserman, A. (2015). Very Deep Convolutional Networks for Large-Scale Image Recognition. In International Conference on Learning Representations.
Sixt et al., (2020) Sixt, L., Granz, M., and Landgraf, T. (2020). When explanations lie: Why many modified bp attributions fail. In International Conference on Machine Learning, pages 9046–9057. PMLR.
Yeom et al., (2021) Yeom, S.-K., Seegerer, P., Lapuschkin, S., Binder, A., Wiedemann, S., Müller, K.-R., and Samek, W. (2021). Pruning by explaining: A novel criterion for deep neural network pruning. Pattern Recognition, 115:107899.
Zhu and Gupta, (2017) Zhu, M. and Gupta, S. (2017). To prune, or not to prune: exploring the efficacy of pruning for model compression. arXiv preprint arXiv:1710.01878.

Sparse Explanations of Neural Networks Using Pruned Layer-Wise Relevance Propagation

Abstract

1 Introduction

2 Related Work

3 Pruned Layer-Wise Relevance Propagation (PLRP)

3.1 Setting

3.2 Pruned LRP

3.2.1 Pruning Relevance

3.2.2 Redistributing Pruned Relevance

4 Evaluation

4.1 Evaluation Metrics

Sparsity

Localization

Faithfulness

Robustness

4.2 Experiments

4.2.1 Image Classification

Model Specification

Results

4.2.2 Genomics

Model Specification

Results

5 Conclusion

References

Appendix

A

B