\myThanks

[s]Corresponding author

Sparse Explanations of Neural Networks Using Pruned Layer-Wise Relevance Propagation

Paulo Yanez Sarmiento    Simon Witzke    Nadja Klein    Bernhard Y. Renard Hasso Plattner Institute, Digital Engineering Faculty, University of Potsdam, Germany Technische Universität Dortmund, Research Center Trustworthy Data Science and Security, Germany
Abstract

Explainability is a key component in many applications involving deep neural networks (DNNs). However, current explanation methods for DNNs commonly leave it to the human observer to distinguish relevant explanations from spurious noise. This is not feasible anymore when going from easily human-accessible data such as images to more complex data such as genome sequences. To facilitate the accessibility of DNN outputs from such complex data and to increase explainability, we present a modification of the widely used explanation method layer-wise relevance propagation. Our approach enforces sparsity directly by pruning the relevance propagation for the different layers. Thereby, we achieve sparser relevance attributions for the input features as well as for the intermediate layers. As the relevance propagation is input-specific, we aim to prune the relevance propagation rather than the underlying model architecture. This allows to prune different neurons for different inputs and hence, might be more appropriate to the local nature of explanation methods. To demonstrate the efficacy of our method, we evaluate it on two types of data, images and genomic sequences. We show that our modification indeed leads to noise reduction and concentrates relevance on the most important features compared to the baseline.

\keyWords

deep learning; explainable AI (XAI); genomics; pruning; sparsity

\acknowledgement

We gratefully acknowledge funding by grants KL 3037/7-1 (to NK) and RE 3474/8-1 (to BYR), project P5 in the Research Unit KI-FOR 5363 of the German Research Foundation (DFG).

1 Introduction

As the usage of deep neural networks (DNNs) has grown tremendously in a variety of fields, so has the interest in the explainability and interpretability of such models. Especially for sensitive areas such as medical imaging (Litjens et al.,, 2017) or genomics (Eraslan et al.,, 2019, Bartoszewicz et al.,, 2021), an understanding of how the model came to certain predictions is essential to build and ensure trustworthiness. Therefore, so-called post-hoc attribution or explanation methods have been introduced (Samek et al.,, 2021) to decode the black-box behavior common for DNN architectures. Thereby, we assume there is an already trained model and consider a separate method to explain it. Instead of global explanations of the model, our focus is on local methods. Their general idea is to obtain an input-specific explanation of the decisive behavior of the model by attributing relevance scores to every input dimension based on the model’s prediction. It has been shown that several of these methods are special cases of the more general game-theoretic concept of Shapley values (Lundberg and Lee,, 2017). Layer-wise relevance propagation (LRP; Bach et al.,, 2015) is among these methods and propagates relevance backward through the network from one layer to another. The basic LRP approach was subsequently developed and extended to further network architectures (Binder et al.,, 2016, Arras et al.,, 2017, Schnake et al.,, 2021, Ali et al.,, 2022, Montavon et al.,, 2019). It has gained increased popularity since then.

Refer to caption
Figure 1: Illustration of sparsified explanations using our pruned layer-wise relevance propagation (PLRP): (a) Heatmaps of explanations for image classifications obtained through LRP (left) and their corresponding sparsified versions through PLRP (right). These sparser relevance attributions are more distinct explanations of the images and can help to identify and interpret the most important features. (b) Corresponding (exemplary) DNNs show how neurons with low relevance are removed, leading to sparser relevance attribution in every layer and fewer paths through which relevance is propagated. These sparser relevance attributions in the intermediate layers potentially allow to better understand latent factors of the model.

For image classification, a human evaluation and interpretation of the LRP explanation can be done qualitatively by a suitable visualization such as heat-maps. The observer can identify the object to be detected and decide whether the corresponding relevance attribution split across the input dimensions matches this object. However, for high-dimensional data with unknown ground truth, this might not be feasible anymore. For example, this is often the case for genome sequences. Besides the high-dimensionality, noisy relevance attributions can make it difficult to identify and interpret the main drivers for the model’s prediction in such data and hence, diminish the meaningfulness of the explanation. As a solution, we propose a modification of LRP that generates sparser explanations by extending the idea of filtering or pruning the relevance propagation (Montavon et al.,, 2018). We call our approach pruned layer-wise relevance propagation (PLRP). Sparsification of the explanation might be desirable in the sense that it reduces noise and the number of features with non-zero relevance, i.e., highlights only the most important features. Furthermore, our method is able to regulate the degree of sparsity, which can be beneficial for feature selection or the identification of concepts learned by the network in deeper layers. The basic intuition of our method is illustrated in Figure 1(a) via three heatmaps of images obtained through LRP (left) and their corresponding sparsified versions through PLRP (right).

Further, Figure 1(b) shows how the relevance propagation in our proposed method PLRP is pruned within a DNN by removing less relevant neurons. This idea of maintaining only the most important features is inspired by L1𝐿1L1italic_L 1-regularization. However, integrating L1𝐿1L1italic_L 1-regularization into LRP is not directly applicable. This is because the map representing the relevance propagation from one layer to another, defined by LRP, is input-specific. Hence, there is no single consistent map for all data points. Approximating all inputs with the same model would be a strong simplification of the original explanation method. It would also disregard the fact that different inputs might activate different neurons in intermediate layers. Altogether, this motivates our contribution of PLRP.

2 Related Work

Since LRP was introduced by Bach et al., (2015), several modifications of it have been proposed. Some work focuses on creating class-discriminative explanations by not only creating relevance attributions for the target class. Gu et al., (2019) and Iwana et al., (2019) propose modifications where, also for non-target classes, the negative prediction score is propagated backward. Hence, the final explanation is the difference between target and non-target classes. Similarly, Montavon et al., (2019) point out that instead of the prediction score, one could also propagate the log-probability ratios to achieve class-discriminative explanations. However, Jung et al., (2021) show that some of these modifications exhibit a so-called erasing object problem. This means when considering differences in explanations, some relevance scores might cancel each other out and thereby potentially falsely erase relevant features from the explanation. To overcome this issue, they introduce a modification of LRP by constraining the relevance propagation to neurons with positive gradient. Hence, when propagating backward, the relevance of some neurons is set to zero. Montavon et al., (2018) also present this idea of filtering the propagation through certain neurons. This is also used by Achtibat et al., (2022) to create more human-understandable explanations, assuming there is some understanding of the representation of data in the latent model space, i.e., the roles of the learned latent factors. They first identify these factors globally for the model and then restrict the propagation of relevance through them locally per input sample. Chormai et al., (2022) propose a numerical approach to identify relevant subspaces of the latent space for models with unknown roles of latent factors. These subspaces allow to create disentangled explanations that decompose the explanation into different components, which can then be visualized and interpreted.

Although related to model pruning (Zhu and Gupta,, 2017), our approach and goal of pruning relevance propagation conceptually differ from it. Instead of pruning the underlying model, we aim to sparsify the individual explanations, which are local approximations of the model. Hence, we prune different neurons depending on the input features, thereby enabling a local and input-specific pruning. In contrast, when Gupta et al., (2021) and Yeom et al., (2021) applied LRP for model pruning, they used the relevance attribution aggregated over multiple inputs to prune the model globally. The authors show that the number of parameters can be reduced drastically, while still maintaining or even improving the predictive performance of the model.

3 Pruned Layer-Wise Relevance Propagation (PLRP)

We propose to extend the idea of filtering the propagation of relevance through certain neurons (Montavon et al.,, 2018) by using the relevance itself as filter criterion. To the best of our knowledge, none of the previous works focused on creating sparse explanations in the sense of locally reducing the input features and latent factors to the most important ones and concentrating relevance mass there. As the explanations are input-specific, a local pruning of relevance propagation is more appropriate for our goal than a global pruning of the model. This motivates our pruned layer-wise relevance propagation (PLRP).

3.1 Setting

Let f:dC,𝒙=(x1,,xd)f(𝒙)=(f1(𝒙),,fC(𝒙)):𝑓formulae-sequencesuperscript𝑑superscript𝐶𝒙superscriptsubscript𝑥1subscript𝑥𝑑topmaps-to𝑓𝒙superscriptsubscript𝑓1𝒙subscript𝑓𝐶𝒙topf:\mathbb{R}^{d}\rightarrow\mathbb{R}^{C},\boldsymbol{x}=(x_{1},\ldots,x_{d})^% {\top}\mapsto f(\boldsymbol{x})=(f_{1}(\boldsymbol{x}),\ldots,f_{C}(% \boldsymbol{x}))^{\top}italic_f : blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT → blackboard_R start_POSTSUPERSCRIPT italic_C end_POSTSUPERSCRIPT , bold_italic_x = ( italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_x start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT ↦ italic_f ( bold_italic_x ) = ( italic_f start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( bold_italic_x ) , … , italic_f start_POSTSUBSCRIPT italic_C end_POSTSUBSCRIPT ( bold_italic_x ) ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT be the pre-softmax prediction function of a trained DNN for a classification task with C𝐶Citalic_C classes, which we also refer to as model. For our setting, we are only interested in the prediction score of the winning class, denoted by fc(𝒙)subscript𝑓superscript𝑐𝒙f_{c^{*}}(\boldsymbol{x})italic_f start_POSTSUBSCRIPT italic_c start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ( bold_italic_x ). For a given model, an explanation method is a function e:dd:𝒙e(𝒙)=(r1,,rd):𝑒superscript𝑑superscript𝑑:maps-to𝒙𝑒𝒙superscriptsubscript𝑟1subscript𝑟𝑑tope:\mathbb{R}^{d}\rightarrow\mathbb{R}^{d}:\boldsymbol{x}\mapsto e(\boldsymbol{% x})=(r_{1},...,r_{d})^{\top}italic_e : blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT → blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT : bold_italic_x ↦ italic_e ( bold_italic_x ) = ( italic_r start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_r start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT that assigns a relevance score risubscript𝑟𝑖r_{i}italic_r start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT to every input dimension i=1,,d𝑖1𝑑i=1,\ldots,ditalic_i = 1 , … , italic_d. We denote by Wlm×nsubscript𝑊𝑙superscript𝑚𝑛W_{l}\in\mathbb{R}^{m\times n}italic_W start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT italic_m × italic_n end_POSTSUPERSCRIPT the weight matrix of the l𝑙litalic_l-th layer of the model and by wjk(l)superscriptsubscript𝑤𝑗𝑘𝑙w_{jk}^{(l)}italic_w start_POSTSUBSCRIPT italic_j italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_l ) end_POSTSUPERSCRIPT the weight of the edge connecting the j𝑗jitalic_j-th neuron in layer l1𝑙1l-1italic_l - 1 with the k𝑘kitalic_k-th neuron in layer l𝑙litalic_l. The corresponding activations are denoted by aj(l1)superscriptsubscript𝑎𝑗𝑙1a_{j}^{(l-1)}italic_a start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_l - 1 ) end_POSTSUPERSCRIPT and ak(l)superscriptsubscript𝑎𝑘𝑙a_{k}^{(l)}italic_a start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_l ) end_POSTSUPERSCRIPT, respectively. Hence, we can write 𝒂(l)=(Wl𝒂(l1))+superscript𝒂𝑙superscriptsubscriptsuperscript𝑊top𝑙superscript𝒂𝑙1\boldsymbol{a}^{(l)}=\big{(}W^{\top}_{l}\boldsymbol{a}^{(l-1)}\big{)}^{+}bold_italic_a start_POSTSUPERSCRIPT ( italic_l ) end_POSTSUPERSCRIPT = ( italic_W start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT bold_italic_a start_POSTSUPERSCRIPT ( italic_l - 1 ) end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT + end_POSTSUPERSCRIPT, where ()+superscript(\cdot)^{+}( ⋅ ) start_POSTSUPERSCRIPT + end_POSTSUPERSCRIPT refers to the entry-wise application of the ReLU function. Since for our purposes we consider only one layer at a time, we will drop the layer-specific superscripts for readability and simply write wjksubscript𝑤𝑗𝑘w_{jk}italic_w start_POSTSUBSCRIPT italic_j italic_k end_POSTSUBSCRIPT. Furthermore, let ()superscript(\cdot)^{-}( ⋅ ) start_POSTSUPERSCRIPT - end_POSTSUPERSCRIPT be the function that only keeps the absolute value of the negative part and is zero otherwise.

Finally, for the relevance attribution, we refer to the relevance of the k𝑘kitalic_k-th neuron in layer l𝑙litalic_l as rk(l)superscriptsubscript𝑟𝑘𝑙r_{k}^{(l)}italic_r start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_l ) end_POSTSUPERSCRIPT and write 𝒓(l)=(r1(l),,rn(l))superscript𝒓𝑙superscriptsuperscriptsubscript𝑟1𝑙superscriptsubscript𝑟𝑛𝑙top\boldsymbol{r}^{(l)}=(r_{1}^{(l)},\ldots,r_{n}^{(l)})^{\top}bold_italic_r start_POSTSUPERSCRIPT ( italic_l ) end_POSTSUPERSCRIPT = ( italic_r start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_l ) end_POSTSUPERSCRIPT , … , italic_r start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_l ) end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT, in particular 𝒓(0)=𝒓=e(𝒙)superscript𝒓0𝒓𝑒𝒙\boldsymbol{r}^{(0)}=\boldsymbol{r}=e(\boldsymbol{x})bold_italic_r start_POSTSUPERSCRIPT ( 0 ) end_POSTSUPERSCRIPT = bold_italic_r = italic_e ( bold_italic_x ). For the last layer L𝐿Litalic_L by definition, we have rc(L):=fc(𝒙)assignsuperscriptsubscript𝑟superscript𝑐𝐿subscript𝑓superscript𝑐𝒙r_{c^{*}}^{(L)}:=f_{c^{*}}(\boldsymbol{x})italic_r start_POSTSUBSCRIPT italic_c start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_L ) end_POSTSUPERSCRIPT := italic_f start_POSTSUBSCRIPT italic_c start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ( bold_italic_x ) and ri(L):=0,icformulae-sequenceassignsuperscriptsubscript𝑟𝑖𝐿0𝑖superscript𝑐r_{i}^{(L)}:=0,i\neq c^{*}italic_r start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_L ) end_POSTSUPERSCRIPT := 0 , italic_i ≠ italic_c start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT.

As described earlier, LRP propagates the relevance backward from one layer to its predecessor, starting with the prediction score of the output layer fc(𝒙)subscript𝑓superscript𝑐𝒙f_{c^{*}}(\boldsymbol{x})italic_f start_POSTSUBSCRIPT italic_c start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ( bold_italic_x ). The total relevance mass is preserved in every layer by the so-called conservation property (Bach et al.,, 2015). The relevance propagation from layer l𝑙litalic_l to l1𝑙1l-1italic_l - 1 can be expressed by a linear map (Sixt et al.,, 2020), i.e., a multiplication with the matrix Ml:nm:𝒓(l)𝒓(l1)=Ml𝒓(l):subscript𝑀𝑙superscript𝑛superscript𝑚:maps-tosuperscript𝒓𝑙superscript𝒓𝑙1subscript𝑀𝑙superscript𝒓𝑙M_{l}:\mathbb{R}^{n}\rightarrow\mathbb{R}^{m}:\boldsymbol{r}^{(l)}\mapsto% \boldsymbol{r}^{(l-1)}=M_{l}\boldsymbol{r}^{(l)}italic_M start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT : blackboard_R start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT → blackboard_R start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT : bold_italic_r start_POSTSUPERSCRIPT ( italic_l ) end_POSTSUPERSCRIPT ↦ bold_italic_r start_POSTSUPERSCRIPT ( italic_l - 1 ) end_POSTSUPERSCRIPT = italic_M start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT bold_italic_r start_POSTSUPERSCRIPT ( italic_l ) end_POSTSUPERSCRIPT. Thereby, the matrix entry in the i𝑖iitalic_i-th row and j𝑗jitalic_j-th column represents the proportion of relevance of neuron j𝑗jitalic_j in layer l𝑙litalic_l that is allocated to neuron i𝑖iitalic_i in layer l1𝑙1l-1italic_l - 1. As for the weights, we will disregard the layer subscript and only write M𝑀Mitalic_M. There are different LRP rules for how these proportions are calculated (Montavon et al.,, 2018). Specifically, for the LRP-0 rule, the matrix M𝑀Mitalic_M is given by

M=[ajwjkj=1majwjk]j=1,,m,k=1,,n.𝑀subscriptmatrixsubscript𝑎𝑗subscript𝑤𝑗𝑘superscriptsubscript𝑗1𝑚subscript𝑎𝑗subscript𝑤𝑗𝑘formulae-sequence𝑗1𝑚𝑘1𝑛\displaystyle M=\begin{bmatrix}\frac{a_{j}w_{jk}}{\sum_{j=1}^{m}a_{j}w_{jk}}% \end{bmatrix}_{j=1,...,m,k=1,...,n}\,.italic_M = [ start_ARG start_ROW start_CELL divide start_ARG italic_a start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT italic_w start_POSTSUBSCRIPT italic_j italic_k end_POSTSUBSCRIPT end_ARG start_ARG ∑ start_POSTSUBSCRIPT italic_j = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT italic_a start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT italic_w start_POSTSUBSCRIPT italic_j italic_k end_POSTSUBSCRIPT end_ARG end_CELL end_ROW end_ARG ] start_POSTSUBSCRIPT italic_j = 1 , … , italic_m , italic_k = 1 , … , italic_n end_POSTSUBSCRIPT . (3.1)

Note that by definition, the columns of M𝑀Mitalic_M are normalized corresponding to redistributing the whole relevance of a neuron. For choosing a rule that yields measurably appropriate relevance attributions, we follow the composite strategy of Kohlbrenner et al., (2020) that combines various rules for different layer types.

3.2 Pruned LRP

The goal of pruned layer-wise relevance propagation (PLRP) is to create sparser explanations by reducing noise and simultaneously increasing the attribution to the most relevant features. We do this by pruning the explanation directly in every layer. This consists of two steps: first, determining the neurons to be pruned, and second, redistributing their relevance mass among the remaining neurons. The criterion for pruning the neurons is the relevance itself, i.e., the neurons with the lowest relevance scores get pruned. Hence, first, we conduct a preliminary unpruned relevance attribution to layer l𝑙litalic_l, prune it, and then after redistribution, use it as input for the attribution to layer l1𝑙1l-1italic_l - 1. As it is unclear in advance how the relevance is distributed among the neurons, we prune a proportion of the total relevance mass instead of a proportion of the neurons.

By redistributing the pruned relevance mass among the remaining neurons, we maintain the relevance conservation of LRP. As we are potentially interested in strong negative contributions, we consider the positive and negative relevance separately, i.e., we aim to prune neurons with small relevance in absolute terms. Hence, in the following by 𝒓(l)superscript𝒓𝑙\boldsymbol{r}^{(l)}bold_italic_r start_POSTSUPERSCRIPT ( italic_l ) end_POSTSUPERSCRIPT we either mean (𝒓(l))+superscriptsuperscript𝒓𝑙\big{(}\boldsymbol{r}^{(l)}\big{)}^{+}( bold_italic_r start_POSTSUPERSCRIPT ( italic_l ) end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT + end_POSTSUPERSCRIPT oder (𝒓(l))superscriptsuperscript𝒓𝑙\big{(}\boldsymbol{r}^{(l)}\big{)}^{-}( bold_italic_r start_POSTSUPERSCRIPT ( italic_l ) end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT - end_POSTSUPERSCRIPT. Note that the proportion of pruned mass can be different for the positive and negative parts of the relevance vector. We do not prune the relevance propagation in the last step, i.e., the first layer is left unpruned. This would correspond to a simple thresholding of the input because it is not further propagated through further layers and thus is neither desired nor meaningful.

3.2.1 Pruning Relevance

We denote the unpruned relevance vector by 𝒓~(l1)superscript~𝒓𝑙1\widetilde{\boldsymbol{r}}^{(l-1)}over~ start_ARG bold_italic_r end_ARG start_POSTSUPERSCRIPT ( italic_l - 1 ) end_POSTSUPERSCRIPT. Then for a pre-defined proportion pl[0,1]subscript𝑝𝑙01p_{l}\in[0,1]italic_p start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT ∈ [ 0 , 1 ] of the total relevance mass that is supposed to be pruned, we determine the corresponding threshold θlsubscript𝜃𝑙\theta_{l}\in\mathbb{R}italic_θ start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT ∈ blackboard_R, such that the relevance scores below θlsubscript𝜃𝑙\theta_{l}italic_θ start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT sum up to plsubscript𝑝𝑙p_{l}italic_p start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT of the total relevance mass. Hence, for (unpruned) ordered relevance scores r~s1(l1)r~s2(l1)r~sn(l1)superscriptsubscript~𝑟subscript𝑠1𝑙1superscriptsubscript~𝑟subscript𝑠2𝑙1superscriptsubscript~𝑟subscript𝑠𝑛𝑙1\widetilde{r}_{s_{1}}^{(l-1)}\leq\widetilde{r}_{s_{2}}^{(l-1)}\leq\ldots\leq% \widetilde{r}_{s_{n}}^{(l-1)}over~ start_ARG italic_r end_ARG start_POSTSUBSCRIPT italic_s start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_l - 1 ) end_POSTSUPERSCRIPT ≤ over~ start_ARG italic_r end_ARG start_POSTSUBSCRIPT italic_s start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_l - 1 ) end_POSTSUPERSCRIPT ≤ … ≤ over~ start_ARG italic_r end_ARG start_POSTSUBSCRIPT italic_s start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_l - 1 ) end_POSTSUPERSCRIPT, we set

θl:=max1in{r~si(l1)|i=1ir~si(l1)pli=1nr~i(l1)}.assignsubscript𝜃𝑙subscript1superscript𝑖𝑛conditionalsuperscriptsubscript~𝑟subscript𝑠superscript𝑖𝑙1superscriptsubscript𝑖1superscript𝑖superscriptsubscript~𝑟subscript𝑠𝑖𝑙1subscript𝑝𝑙superscriptsubscript𝑖1𝑛superscriptsubscript~𝑟𝑖𝑙1\displaystyle\theta_{l}:=\max_{1\leq i^{*}\leq n}\Bigg{\{}\widetilde{r}_{s_{i^% {*}}}^{(l-1)}\,\Bigg{|}\,\sum_{i=1}^{i^{*}}\widetilde{r}_{s_{i}}^{(l-1)}\leq p% _{l}\sum_{i=1}^{n}\widetilde{r}_{i}^{(l-1)}\Bigg{\}}\,.italic_θ start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT := roman_max start_POSTSUBSCRIPT 1 ≤ italic_i start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ≤ italic_n end_POSTSUBSCRIPT { over~ start_ARG italic_r end_ARG start_POSTSUBSCRIPT italic_s start_POSTSUBSCRIPT italic_i start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_l - 1 ) end_POSTSUPERSCRIPT | ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT over~ start_ARG italic_r end_ARG start_POSTSUBSCRIPT italic_s start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_l - 1 ) end_POSTSUPERSCRIPT ≤ italic_p start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT over~ start_ARG italic_r end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_l - 1 ) end_POSTSUPERSCRIPT } . (3.2)

If plsubscript𝑝𝑙p_{l}italic_p start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT leads to an empty set in (3.2), we set θl:=0assignsubscript𝜃𝑙0\theta_{l}:=0italic_θ start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT := 0. Hence, we obtain the desired relevance pruning as

𝟏{𝒓~(l1)>θl}𝒓~(l1)1(1pl)𝒓~(l1)1,subscriptnormdirect-productsubscript1superscript~𝒓𝑙1subscript𝜃𝑙superscript~𝒓𝑙111subscript𝑝𝑙subscriptnormsuperscript~𝒓𝑙11\displaystyle\left\|\boldsymbol{1}_{\left\{\widetilde{\boldsymbol{r}}^{(l-1)}>% \theta_{l}\right\}}\odot\widetilde{\boldsymbol{r}}^{(l-1)}\right\|_{1}\geq(1-p% _{l})\left\|\widetilde{\boldsymbol{r}}^{(l-1)}\right\|_{1}\,,∥ bold_1 start_POSTSUBSCRIPT { over~ start_ARG bold_italic_r end_ARG start_POSTSUPERSCRIPT ( italic_l - 1 ) end_POSTSUPERSCRIPT > italic_θ start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT } end_POSTSUBSCRIPT ⊙ over~ start_ARG bold_italic_r end_ARG start_POSTSUPERSCRIPT ( italic_l - 1 ) end_POSTSUPERSCRIPT ∥ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ≥ ( 1 - italic_p start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT ) ∥ over~ start_ARG bold_italic_r end_ARG start_POSTSUPERSCRIPT ( italic_l - 1 ) end_POSTSUPERSCRIPT ∥ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ,

where direct-product\odot denotes the element-wise multiplication and 𝟏{>θl}\boldsymbol{1}_{\{\,\boldsymbol{\cdot}\,>\theta_{l}\}}bold_1 start_POSTSUBSCRIPT { bold_⋅ > italic_θ start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT } end_POSTSUBSCRIPT is the element-wise application of the indicator function. Hence, we only keep the entries where the neuron’s relevance lies above the threshold θlsubscript𝜃𝑙\theta_{l}italic_θ start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT. Every plsubscript𝑝𝑙p_{l}italic_p start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT can be seen as a hyperparameter that could be tuned. However, for dense networks, this can easily become computationally very expensive. Further, as the relevance attribution is input-specific, the optimal parameter might differ significantly for different samples. Therefore, we consider an additional approach that determines θlsubscript𝜃𝑙\theta_{l}italic_θ start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT and hence, plsubscript𝑝𝑙p_{l}italic_p start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT after the relevance propagation in every layer. As we are interested in sparse relevance vectors, we consider the so-called sparsity gain. Thereby, the additional degree of sparsity with respect to pruned relevance is calculated. The idea is that neurons with small relevance yield a high gain, while the gain decreases for neurons that concentrate mass as more mass must be pruned for an additional degree of sparsity. Since the total relevance may differ for different inputs, we consider the relative pruning, i.e., the proportion plsubscript𝑝𝑙p_{l}italic_p start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT. Therefore, consider a real-valued vector v0subscript𝑣0v_{0}italic_v start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT. We iteratively sparsify v0subscript𝑣0v_{0}italic_v start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT by setting some entries to zero. Let vsuperscript𝑣v^{\prime}italic_v start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT denote the vector after setting some entries in v0subscript𝑣0v_{0}italic_v start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT to zero and v′′superscript𝑣′′v^{\prime\prime}italic_v start_POSTSUPERSCRIPT ′ ′ end_POSTSUPERSCRIPT the vector after further entries in vsuperscript𝑣v^{\prime}italic_v start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT were set to zero. Then for some sparsity measure s𝑠sitalic_s that increases with respect to the degree of sparsity, the sparsity gain for setting vsuperscript𝑣v^{\prime}italic_v start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT to v′′superscript𝑣′′v^{\prime\prime}italic_v start_POSTSUPERSCRIPT ′ ′ end_POSTSUPERSCRIPT is defined as

ΔSG:=ΔsΔpl=v01s(v′′)s(v)v′′1v1.assignsubscriptΔ𝑆𝐺Δ𝑠Δsubscript𝑝𝑙subscriptnormsubscript𝑣01𝑠superscript𝑣′′𝑠superscript𝑣subscriptnormsuperscript𝑣′′1subscriptnormsuperscript𝑣1\displaystyle\Delta_{SG}:=\frac{\Delta s}{\Delta p_{l}}=\|v_{0}\|_{1}\frac{s(v% ^{\prime\prime})-s(v^{\prime})}{\|v^{\prime\prime}\|_{1}-\|v^{\prime}\|_{1}}\,.roman_Δ start_POSTSUBSCRIPT italic_S italic_G end_POSTSUBSCRIPT := divide start_ARG roman_Δ italic_s end_ARG start_ARG roman_Δ italic_p start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT end_ARG = ∥ italic_v start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ∥ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT divide start_ARG italic_s ( italic_v start_POSTSUPERSCRIPT ′ ′ end_POSTSUPERSCRIPT ) - italic_s ( italic_v start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) end_ARG start_ARG ∥ italic_v start_POSTSUPERSCRIPT ′ ′ end_POSTSUPERSCRIPT ∥ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT - ∥ italic_v start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ∥ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_ARG . (3.3)

When setting entries in v0subscript𝑣0v_{0}italic_v start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT iteratively to zero based on their increasing value, then the sparsity gain is monotonically decreasing. Hence, a minimal allowed sparsity gain ΔSGsubscriptΔ𝑆𝐺\Delta_{SG}roman_Δ start_POSTSUBSCRIPT italic_S italic_G end_POSTSUBSCRIPT determines a proportion of pruned mass plsubscript𝑝𝑙p_{l}italic_p start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT and thus a threshold θlsubscript𝜃𝑙\theta_{l}italic_θ start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT. Therefore, the approach of sparsity gain makes PLRP independent of choosing fixed proportions plsubscript𝑝𝑙p_{l}italic_p start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT in advance. Instead, every plsubscript𝑝𝑙p_{l}italic_p start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT is determined during the process of pruning. This procedure is related to considering a minimal or maximal change of a function, i.e., its derivative. Nevertheless, the choice of a minimal sparsity gain is necessary. A natural choice would be 1, i.e., when an additional degree of sparsity becomes more expensive than one unit of pruned mass.

3.2.2 Redistributing Pruned Relevance

The pruned relevance is redistributed among the remaining neurons with non-zero relevance. We consider two possible approaches. The first one is based on the relative proportion of the neuron’s relevance after pruning. This is equivalent to a rescaling by a factor λ𝜆\lambdaitalic_λ, such that the total relevance mass sums up again to the value before pruning. We refer to this procedure as PLRP-λ𝜆\lambdaitalic_λ. For this, define

𝒓(l1):=λl 1{𝒓~(l1)>θl}𝒓~(l1),assignsuperscript𝒓𝑙1subscript𝜆𝑙subscript1superscript~𝒓𝑙1subscript𝜃𝑙superscript~𝒓𝑙1\displaystyle\boldsymbol{r}^{(l-1)}:=\lambda_{l}\,\boldsymbol{1}_{\big{\{}% \widetilde{\boldsymbol{r}}^{(l-1)}>\theta_{l}\big{\}}}\widetilde{\boldsymbol{r% }}^{(l-1)}\,,bold_italic_r start_POSTSUPERSCRIPT ( italic_l - 1 ) end_POSTSUPERSCRIPT := italic_λ start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT bold_1 start_POSTSUBSCRIPT { over~ start_ARG bold_italic_r end_ARG start_POSTSUPERSCRIPT ( italic_l - 1 ) end_POSTSUPERSCRIPT > italic_θ start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT } end_POSTSUBSCRIPT over~ start_ARG bold_italic_r end_ARG start_POSTSUPERSCRIPT ( italic_l - 1 ) end_POSTSUPERSCRIPT ,

with

λl=𝒓~(l1)1/𝟏{𝒓~(l1)>θl}𝒓~(l1)111pl.subscript𝜆𝑙subscriptnormsuperscript~𝒓𝑙11subscriptnormdirect-productsubscript1superscript~𝒓𝑙1subscript𝜃𝑙superscript~𝒓𝑙1111subscript𝑝𝑙\displaystyle\lambda_{l}=\left\|\widetilde{\boldsymbol{r}}^{(l-1)}\right\|_{1}% \,\Big{/}\,\left\|\boldsymbol{1}_{\left\{\widetilde{\boldsymbol{r}}^{(l-1)}>% \theta_{l}\right\}}\odot\widetilde{\boldsymbol{r}}^{(l-1)}\right\|_{1}\approx% \frac{1}{1-p_{l}}\,.italic_λ start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT = ∥ over~ start_ARG bold_italic_r end_ARG start_POSTSUPERSCRIPT ( italic_l - 1 ) end_POSTSUPERSCRIPT ∥ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT / ∥ bold_1 start_POSTSUBSCRIPT { over~ start_ARG bold_italic_r end_ARG start_POSTSUPERSCRIPT ( italic_l - 1 ) end_POSTSUPERSCRIPT > italic_θ start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT } end_POSTSUBSCRIPT ⊙ over~ start_ARG bold_italic_r end_ARG start_POSTSUPERSCRIPT ( italic_l - 1 ) end_POSTSUPERSCRIPT ∥ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ≈ divide start_ARG 1 end_ARG start_ARG 1 - italic_p start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT end_ARG .

For the second approach, denoted by PLRP-M𝑀Mitalic_M, we make use of the fact that LRP assigns zero relevance to neurons with zero activation, i.e., aj(l)=0rj(l)=0superscriptsubscript𝑎𝑗𝑙0superscriptsubscript𝑟𝑗𝑙0a_{j}^{(l)}=0\Rightarrow r_{j}^{(l)}=0italic_a start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_l ) end_POSTSUPERSCRIPT = 0 ⇒ italic_r start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_l ) end_POSTSUPERSCRIPT = 0. Therefore, (re-)applying LRP with more zero activations leads to sparser relevance attribution without the necessity of rescaling the relevance mass. We do this by modifying the matrix M𝑀Mitalic_M expressing the relevance propagation from Equation (3.1). Given the unpruned relevance vector 𝒓~(l1)superscript~𝒓𝑙1\widetilde{\boldsymbol{r}}^{(l-1)}over~ start_ARG bold_italic_r end_ARG start_POSTSUPERSCRIPT ( italic_l - 1 ) end_POSTSUPERSCRIPT, we define the modified activations by

𝒂^(l1)superscript^𝒂𝑙1\displaystyle\widehat{\boldsymbol{a}}^{(l-1)}over^ start_ARG bold_italic_a end_ARG start_POSTSUPERSCRIPT ( italic_l - 1 ) end_POSTSUPERSCRIPT :=𝟏{𝒓~(l1)>θl}𝒂(l1)assignabsentdirect-productsubscript1superscript~𝒓𝑙1subscript𝜃𝑙superscript𝒂𝑙1\displaystyle:=\boldsymbol{1}_{\left\{\widetilde{\boldsymbol{r}}^{(l-1)}>% \theta_{l}\right\}}\odot\boldsymbol{a}^{(l-1)}:= bold_1 start_POSTSUBSCRIPT { over~ start_ARG bold_italic_r end_ARG start_POSTSUPERSCRIPT ( italic_l - 1 ) end_POSTSUPERSCRIPT > italic_θ start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT } end_POSTSUBSCRIPT ⊙ bold_italic_a start_POSTSUPERSCRIPT ( italic_l - 1 ) end_POSTSUPERSCRIPT (3.4)

and hence, the modified relevance attribution as

M^lsubscript^𝑀𝑙\displaystyle\widehat{M}_{l}over^ start_ARG italic_M end_ARG start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT :=[a^j(l1)wjk(l)j=1ma^j(l1)wjk(l)]j=1,,m,k=1,,nassignabsentsubscriptmatrixsuperscriptsubscript^𝑎𝑗𝑙1superscriptsubscript𝑤𝑗𝑘𝑙superscriptsubscript𝑗1𝑚superscriptsubscript^𝑎𝑗𝑙1superscriptsubscript𝑤𝑗𝑘𝑙formulae-sequence𝑗1𝑚𝑘1𝑛\displaystyle:=\begin{bmatrix}\frac{\widehat{a}_{j}^{(l-1)}w_{jk}^{(l)}}{\sum_% {j=1}^{m}\widehat{a}_{j}^{(l-1)}w_{jk}^{(l)}}\end{bmatrix}_{j=1,...,m,k=1,% \ldots,n}:= [ start_ARG start_ROW start_CELL divide start_ARG over^ start_ARG italic_a end_ARG start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_l - 1 ) end_POSTSUPERSCRIPT italic_w start_POSTSUBSCRIPT italic_j italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_l ) end_POSTSUPERSCRIPT end_ARG start_ARG ∑ start_POSTSUBSCRIPT italic_j = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT over^ start_ARG italic_a end_ARG start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_l - 1 ) end_POSTSUPERSCRIPT italic_w start_POSTSUBSCRIPT italic_j italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_l ) end_POSTSUPERSCRIPT end_ARG end_CELL end_ROW end_ARG ] start_POSTSUBSCRIPT italic_j = 1 , … , italic_m , italic_k = 1 , … , italic_n end_POSTSUBSCRIPT (3.5)
𝒓(l1)superscript𝒓𝑙1\displaystyle\boldsymbol{r}^{(l-1)}bold_italic_r start_POSTSUPERSCRIPT ( italic_l - 1 ) end_POSTSUPERSCRIPT :=M^l𝒓(l).assignabsentsubscript^𝑀𝑙superscript𝒓𝑙\displaystyle:=\widehat{M}_{l}\boldsymbol{r}^{(l)}\,.:= over^ start_ARG italic_M end_ARG start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT bold_italic_r start_POSTSUPERSCRIPT ( italic_l ) end_POSTSUPERSCRIPT . (3.6)

Hence, by attributing relevance according to M^lsubscript^𝑀𝑙\widehat{M}_{l}over^ start_ARG italic_M end_ARG start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT, we ensure that neurons receive zero relevance that would have been assigned a relevance mass below the threshold θlsubscript𝜃𝑙\theta_{l}italic_θ start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT of Equation 3.2 otherwise. As M^lsubscript^𝑀𝑙\widehat{M}_{l}over^ start_ARG italic_M end_ARG start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT is also a matrix in the form of Equation 3.1, only with more zero activations, the pruned relevance gets implicitly redistributed in the manner of LRP.

4 Evaluation

We evaluate the two variants of PLRP in the context of classification of two different data types: images and genomic sequences. For the quantitative evaluation, we use multiple metrics covering different aspects of the explanation method following the categorization of Hedström et al., (2023). Further, we evaluate our methods qualitatively by considering heatmaps and sequence logo plots of the relevance attributions.

4.1 Evaluation Metrics

Sparsity and localization are intended to measure the efficacy of our method, while faithfulness and robustness serve as sanity checks for its trustworthiness.

Sparsity

As we are interested in sparse explanations that concentrate the relevance mass at few important features, we evaluate the degree of sparsity or complexity by the Gini Index and entropy of the relevance attribution as proposed by Chalasani et al., (2020) and Bhatt et al., (2020). Thereby, a higher Gini Index or lower entropy, respectively, indicate that there are more zero entries and more mass is concentrated at fewer entries.

Localization

Localization measures how accurately the relevance is attributed to a pre-specified region. Hence, for this metric, a known ground truth is needed. We therefore calculate the Relevance Mass Accuracy (RMA; Arras et al.,, 2022), which measures how much relevance mass out of the total mass is allocated to the ground truth.

Faithfulness

Faithfulness measures whether the explanation method indeed uncovers the predictive behavior of the underlying model. This means that the highest relevance scores correspond to the most decisive input features. The idea is to perturb the input based on the relevance attribution and then measure how quickly the predictive score drops. This procedure is also referred to as pixel-flipping (Bach et al.,, 2015) oder selectivity (Samek et al.,, 2016). Hence, we have a perturbation function that takes the model input 𝒙𝒙\boldsymbol{x}bold_italic_x and corresponding relevance attribution 𝒓=e(𝒙)𝒓𝑒𝒙\boldsymbol{r}=e(\boldsymbol{x})bold_italic_r = italic_e ( bold_italic_x ) and outputs a sequence of perturbed versions of 𝒙𝒙\boldsymbol{x}bold_italic_x based on 𝒓𝒓\boldsymbol{r}bold_italic_r, i.e., (𝒙,𝒓)(𝒙1,,𝒙s)maps-to𝒙𝒓superscriptsubscript𝒙1superscriptsubscript𝒙𝑠(\boldsymbol{x},\boldsymbol{r})\mapsto(\boldsymbol{x}_{1}^{\prime},\ldots,% \boldsymbol{x}_{s}^{\prime})( bold_italic_x , bold_italic_r ) ↦ ( bold_italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT , … , bold_italic_x start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) where s𝑠sitalic_s denotes the number of perturbation steps. As the actual value of prediction scores can differ from one class to another, we normalize the scores and measure the relative drop. Further, to compare the results across different pruning parameters plsubscript𝑝𝑙p_{l}italic_p start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT, we also consider the area under the curve (AUC). Note that a quicker decrease in the prediction score corresponds to a lower AUC and hence, to a better explanation with respect to this metric.

Robustness

For explanation methods, robustness refers to the idea that similar inputs should produce similar explanations. Hence, it can be seen as a sensitivity proxy that measures how much the explanation changes for a small shift in the input. Alvarez-Melis and Jaakkola, (2018) propose to estimate the local Lipschitz constant, i.e., for ε>0𝜀0\varepsilon>0italic_ε > 0 and some input 𝒙d𝒙superscript𝑑\boldsymbol{x}\in\mathbb{R}^{d}bold_italic_x ∈ blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT, the aim is to find a constant L=L(𝒙)𝐿𝐿𝒙L=L(\boldsymbol{x})italic_L = italic_L ( bold_italic_x ) such that for all 𝒙Bε(𝒙)={𝒙:𝒙𝒙<ε}superscript𝒙subscript𝐵𝜀𝒙conditional-setsuperscript𝒙norm𝒙superscript𝒙𝜀\boldsymbol{x}^{\prime}\in B_{\varepsilon}(\boldsymbol{x})=\{\boldsymbol{x}^{% \prime}:\|\boldsymbol{x}-\boldsymbol{x}^{\prime}\|<\varepsilon\}bold_italic_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ∈ italic_B start_POSTSUBSCRIPT italic_ε end_POSTSUBSCRIPT ( bold_italic_x ) = { bold_italic_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT : ∥ bold_italic_x - bold_italic_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ∥ < italic_ε } it holds e(𝒙)e(𝒙)<L𝒙𝒙norm𝑒𝒙𝑒superscript𝒙𝐿norm𝒙superscript𝒙\|e(\boldsymbol{x})-e(\boldsymbol{x}^{\prime})\|<L\|\boldsymbol{x}-\boldsymbol% {x}^{\prime}\|∥ italic_e ( bold_italic_x ) - italic_e ( bold_italic_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) ∥ < italic_L ∥ bold_italic_x - bold_italic_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ∥. Hence, L𝐿Litalic_L is the steepest possible ascent of e𝑒eitalic_e in a small neighborhood of 𝒙𝒙\boldsymbol{x}bold_italic_x. As calculating the gradient of the explanation method might be very expensive, or it might not even be differentiable, L𝐿Litalic_L is estimated numerically by sampling from Bε(𝒙)subscript𝐵𝜀𝒙B_{\varepsilon}(\boldsymbol{x})italic_B start_POSTSUBSCRIPT italic_ε end_POSTSUBSCRIPT ( bold_italic_x ).

4.2 Experiments

4.2.1 Image Classification

Model Specification

We perform our evaluation on the ImageNet dataset (Russakovsky et al.,, 2015) and the Extended Complex Scene Saliency Dataset (ECSSD) (Shi et al.,, 2015). Thereby, we consider two established model architectures: VGG-16 (Simonyan and Zisserman,, 2015) and ResNet-50 (He et al.,, 2016), both convolutional neural networks (CNN). The ground truth masks of ECCSD are more granular than the bounding boxes of ImageNet, which contain more pixels than the actual ground truth object. Therefore, ECCSD is more suitable for measuring localization. However, for comparison and completeness, we measure sparsity, faithfulness, and robustness also on the widely used ImageNet dataset. Thereby, we use a sample of 3,900 images of its validation set covering all classes. As we must draw multiple samples to calculate the Lipschitz estimate for a single input for robustness analyses, we restrict the number of input samples to 10. The evaluation includes both of our approaches, PLRP-λ𝜆\lambdaitalic_λ and PLRP-M𝑀Mitalic_M, with and without sparsity gain.

Our implementation of PLRP111Code available at https://anonymous.4open.science/r/plrp-FD34 is based on the Zennit package (Anders et al.,, 2021). For the evaluation, we use the Quantus package (Hedström et al.,, 2023) that implements several evaluation metrics. For evaluating faithfulness, we use a feature of the iNNvestigate package (Alber et al.,, 2019) that perturbs the input 𝒙𝒙\boldsymbol{x}bold_italic_x based on its explanation e(𝒙)𝑒𝒙e(\boldsymbol{x})italic_e ( bold_italic_x ).

Results

We observe that our approach creates sparser explanations, while maintaining the most relevant features. Figure 2 illustrates that the Gini Index increases drastically for all our approaches compared to the LRP baseline. Simultaneously, the RMA increases, indicating that after pruning and redistributing a higher proportion of relevance falls within the ground truth mask. In both metrics, we observe a steep increase already for small proportions (<0.2absent0.2<0.2< 0.2) of pruned relevance p𝑝pitalic_p. For both models, VGG16 and ResNet50, the simpler approach PLRP-λ𝜆\lambdaitalic_λ with and without sparsity gain leads to better results compared to PLRP-M𝑀Mitalic_M and the LRP baseline. We also find that the RMA starts decreasing again for higher p𝑝pitalic_p. This is not surprising as, at some point, the most decisive features might also get pruned. These findings are in line with what we observe qualitatively by visualization (see Figure 4), i.e., PLRP-λ𝜆\lambdaitalic_λ produces sparser explanations than PLRP-M𝑀Mitalic_M for the same pruning parameter p𝑝pitalic_p. For higher p𝑝pitalic_p, also relevance within the ground truth mask gets pruned. The entropy shows completely analogous results to the Gini Index and hence, is not displayed here.

With respect to faithfulness, except for one approach (PLRP-M𝑀Mitalic_M with and without sparsity gain and p<0.3𝑝0.3p<0.3italic_p < 0.3), we observe slightly worse results than the baseline. Note that better results correspond to a lower AUC as we are interested in a quick drop in the prediction score. However, we observe that the higher AUC is mainly driven by less important features, i.e., we have a less steep decrease in the prediction score for later perturbed and hence, less important features. This is in line with our goal of maintaining only the most important features. In fact, for the features with the highest relevance, the prediction score drops similarly steeply as for the baseline (see Figure 3). Overall, the relative decline in faithfulness is small compared to the relative gain in sparsity and localization, especially for small p𝑝pitalic_p.

Finally, PLRP-λ𝜆\lambdaitalic_λ is always more robust than the LRP baseline. Only PLRP-M𝑀Mitalic_M is less robust than the baseline for few parameterizations. Therefore, the explanations of our approach can be seen as reliable and stable.

Refer to caption
Figure 2: Results for ECSSD for metrics sparsity, localization, faithfulness, and robustness for different proportions of pruned relevance mass p𝑝pitalic_p for models VGG16 and ResNet50. For sparsity, localization, and faithfulness, the y-axis covers the whole output domain of [0,1]01[0,1][ 0 , 1 ]. The zoom plots focus on the actual covered output domain for better comparison of the methods.
Refer to caption
Figure 3: Results for ECSSD for faithfulness for an exemplary parameterization p=0.15𝑝0.15p=0.15italic_p = 0.15 for models VGG16 (left) and ResNet50 (right). The prediction score fc(𝒙)subscript𝑓superscript𝑐𝒙f_{c^{*}}(\boldsymbol{x})italic_f start_POSTSUBSCRIPT italic_c start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ( bold_italic_x ) drops similarly steeply as for the LRP baseline for the features with the highest relevance that are perturbed first. The difference in AUC is rather driven by the less important features that are perturbed later.
Refer to caption
Figure 4: Illustration of relevance attribution via heatmaps for LRP and PLRP with different parameterizations for the VGG16 model. Red pixels indicate positive relevance, while blue are negative.

Overall, PLRP-λ𝜆\lambdaitalic_λ is able to increase sparsity quicker and concentrate more relevance within the ground truth mask than PLRP-M𝑀Mitalic_M, i.e., there is a steeper increase for the same proportion of pruned relevance. It appears that redistributing the pruned relevance in the manner of LRP as done by PLRP-M𝑀Mitalic_M slows down the concentration of relevance. This is also in line with PLRP-M𝑀Mitalic_M sticking closer to the baseline with respect to faithfulness. Another driver for the better results of PLRP-λ𝜆\lambdaitalic_λ over PLRP-M𝑀Mitalic_M with respect to sparsity, localization, and robustness appears to be that PLRP-M𝑀Mitalic_M can lead to a sign flipping of a neuron’s relevance (see negative relevance for PLRP-M𝑀Mitalic_M in Figure 4(d)). PLRP-M𝑀Mitalic_M changes the proportions of how much relevance is attributed from one neuron to another (see Equation 3.5). Since this might increase the proportion of incoming negative relevance or decrease it for incoming positive relevance, respectively, the sign of the total relevance a neuron receives might switch. Furthermore, the sign of the proportions themselves might change. PLRP-λ𝜆\lambdaitalic_λ, on the other hand, only rescales the relevance by a positive factor and hence, preserves the initial sign of the relevance score.

The quantitative results for the ImageNet dataset for sparsity, faithfulness, and robustness are analogous to the results for ECSSD and can be found in the supplementary material in section A.

4.2.2 Genomics

Model Specification

Next, we apply PLRP to explain predictions of CNNs trained on synthetic genome sequences data with known ground truth (Lemanczyk et al.,, 2024). Specifically, the input consists of strings of length 250 from the alphabet {A,C,G,T}𝐴𝐶𝐺𝑇\{A,C,G,T\}{ italic_A , italic_C , italic_G , italic_T } representing the four base pairs in DNA. The sequences are one-hot-encoded. Hence, our input domain is {0,1}4×250superscript014250\{0,1\}^{4\times 250}{ 0 , 1 } start_POSTSUPERSCRIPT 4 × 250 end_POSTSUPERSCRIPT. Thereby, we consider two different model architectures: one with only four filters and another one with 32. As we operate on a discrete domain, slight changes in the input or perturbations are not well-defined. Therefore, we only calculate the metrics for sparsity and localization.

Results

Similar to the task of image classification, both PLRP variants produce sparser explanations compared to the LRP baseline. We observe qualitatively (see Figure 5) that noise is reduced and relevance is concentrated at the ground truth patterns in the genome sequence (motifs). Similar to the application to image classification, we find that, in general, the RMA increases for an increasing proportion of pruned relevance p𝑝pitalic_p. Hence, the more we prune and redistribute, the more relevance lies within the ground truth mask. Simultaneously, sparsity rises, while complexity is reduced as indicated by an increasing Gini Index.

Refer to caption
Figure 5: Illustration of pruned explanations: Logo plots of (a) ground truth mask, (b) relevance scores for LRP, and (c) PLRP-λ𝜆\lambdaitalic_λ with p=0.25𝑝0.25p=0.25italic_p = 0.25 for the model’s prediction. PLRP reduces noise and creates a sparser explanation compared to the LRP baseline. Thereby, PLRP maintains features that lie within the ground truth even if they have less relevance than the noise at other irrelevant features. Hence, it indeed differs from only applying a threshold.

Figure 5 also shows that this noise reduction indeed differs from simple thresholding, as it maintains relevance scores within the ground truth mask that are smaller than the noise signal at other irrelevant features.

The results of PLRP-λ𝜆\lambdaitalic_λ and PLRP-M𝑀Mitalic_M only differ slightly. This might be due shallower networks used here compared to the image classification. The fewer layers there are, the less both variants can diverge from each other. The quantitative results for the genomics application can be found in the supplementary material in section B.

5 Conclusion

In this work, we introduced PLRP, a modification of LRP that enforces sparsity directly by locally pruning the relevance propagation for the different layers. We presented two approaches to redistribute the pruned relevance mass so that the conservation property is preserved: the simpler PLRP-λ𝜆\lambdaitalic_λ and the more complex PLRP-M𝑀Mitalic_M, which is more in line with original LRP methodology. The evaluation on the ECSSD and ImageNet datasets shows that we indeed obtain sparser explanations than the LRP baseline. The pruning leads to only a slight decrease in faithfulness, mainly driven by the least important features. By measuring localization, we show that while becoming sparser, more relevance is attributed to features within the ground truth mask. This demonstrates the efficacy of PLRP compared to LRP. In fact, both our approaches lead to noise reduction and concentrate relevance at the most important features. Qualitative evaluation by visualization supports this claim. Furthermore, our approach is similar and, depending on the parameterization, even more robust to small changes in the input than the LRP baseline. For the genomics application, we observe similar effects. We obtain sparser explanations with higher concentration of relevance on the most important features in the genome sequence, aiding in the interpretation of the model outputs. Overall, PLRP-λ𝜆\lambdaitalic_λ produces better results for our goal of sparsity and relevance concentration than PLRP-M𝑀Mitalic_M. This is partially driven by the effect of relevance sign flipping. However, it should be investigated whether this effect can be reduced by further modification. Additionally, how to define and find an optimal parameterization for PRLP remain open questions. This might heavily depend on the underlying model but also on the “optimal” degree of sparsity for a specific task. Finally, as PLRP not only prunes the relevance attribution for the input features but also for neurons in intermediate layers, it could be used to study what concepts a model learned in deeper layers.

References

  • Achtibat et al., (2022) Achtibat, R., Dreyer, M., Eisenbraun, I., Bosse, S., Wiegand, T., Samek, W., and Lapuschkin, S. (2022). From" Where" to" What": Towards Human-Understandable Explanations through Concept Relevance Propagation. arXiv preprint arXiv:2206.03208.
  • Alber et al., (2019) Alber, M., Lapuschkin, S., Seegerer, P., Hägele, M., Schütt, K. T., Montavon, G., Samek, W., Müller, K.-R., Dähne, S., and Kindermans, P.-J. (2019). iNNvestigate Neural Networks! Journal of Machine Learning Research, 20(93):1–8.
  • Ali et al., (2022) Ali, A., Schnake, T., Eberle, O., Montavon, G., Müller, K.-R., and Wolf, L. (2022). Xai for transformers: Better explanations through conservative propagation. In International Conference on Machine Learning, pages 435–451. PMLR.
  • Alvarez-Melis and Jaakkola, (2018) Alvarez-Melis, D. and Jaakkola, T. S. (2018). On the robustness of interpretability methods. arXiv preprint arXiv:1806.08049.
  • Anders et al., (2021) Anders, C. J., Neumann, D., Samek, W., Müller, K.-R., and Lapuschkin, S. (2021). Software for Dataset-wide XAI: From Local Explanations to Global Insights with Zennit, CoRelAy, and ViRelAy. CoRR, abs/2106.13200.
  • Arras et al., (2017) Arras, L., Montavon, G., Müller, K.-R., and Samek, W. (2017). Explaining recurrent neural network predictions in sentiment analysis. arXiv preprint arXiv:1706.07206.
  • Arras et al., (2022) Arras, L., Osman, A., and Samek, W. (2022). CLEVR-XAI: a benchmark dataset for the ground truth evaluation of neural network explanations. Information Fusion, 81:14–40.
  • Bach et al., (2015) Bach, S., Binder, A., Montavon, G., Klauschen, F., Müller, K.-R., and Samek, W. (2015). On Pixel-Wise Explanations for Non-Linear Classifier Decisions by Layer-Wise Relevance Propagation. PLoS ONE, 10.
  • Bartoszewicz et al., (2021) Bartoszewicz, J. M., Seidel, A., and Renard, B. Y. (2021). Interpretable detection of novel human viruses from genome sequencing data. NAR Genomics and Bioinformatics, 3(1):lqab004.
  • Bhatt et al., (2020) Bhatt, U., Weller, A., and Moura, J. M. (2020). Evaluating and aggregating feature-based model explanations. arXiv preprint arXiv:2005.00631.
  • Binder et al., (2016) Binder, A., Montavon, G., Lapuschkin, S., Müller, K.-R., and Samek, W. (2016). Layer-wise relevance propagation for neural networks with local renormalization layers. In Artificial Neural Networks and Machine Learning–ICANN 2016: 25th International Conference on Artificial Neural Networks, Barcelona, Spain, September 6-9, 2016, Proceedings, Part II 25, pages 63–71. Springer.
  • Chalasani et al., (2020) Chalasani, P., Chen, J., Chowdhury, A. R., Wu, X., and Jha, S. (2020). Concise explanations of neural networks using adversarial training. In International Conference on Machine Learning, pages 1383–1391. PMLR.
  • Chormai et al., (2022) Chormai, P., Herrmann, J., Müller, K.-R., and Montavon, G. (2022). Disentangled explanations of neural network predictions by finding relevant subspaces. arXiv preprint arXiv:2212.14855.
  • Eraslan et al., (2019) Eraslan, G., Avsec, Ž., Gagneur, J., and Theis, F. J. (2019). Deep learning: new computational modelling techniques for genomics. Nature Reviews Genetics, 20(7):389–403.
  • Gu et al., (2019) Gu, J., Yang, Y., and Tresp, V. (2019). Understanding Individual Decisions of CNNs via Contrastive Backpropagation. In Computer Vision–ACCV 2018: 14th Asian Conference on Computer Vision, Perth, Australia, December 2–6, 2018, Revised Selected Papers, Part III 14, pages 119–134. Springer.
  • Gupta et al., (2021) Gupta, S., Chan, Y. H., Rajapakse, J. C., Initiative, A. D. N., et al. (2021). Obtaining leaner deep neural networks for decoding brain functional connectome in a single shot. Neurocomputing, 453:326–336.
  • He et al., (2016) He, K., Zhang, X., Ren, S., and Sun, J. (2016). Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 770–778.
  • Hedström et al., (2023) Hedström, A., Weber, L., Krakowczyk, D., Bareeva, D., Motzkus, F., Samek, W., Lapuschkin, S., and Höhne, M. M. M. (2023). Quantus: An Explainable AI Toolkit for Responsible Evaluation of Neural Network Explanations and Beyond. Journal of Machine Learning Research, 24(34):1–11.
  • Iwana et al., (2019) Iwana, B. K., Kuroki, R., and Uchida, S. (2019). Explaining Convolutional Neural Networks Using Softmax Gradient Layer-wise Relevance Propagation. In 2019 IEEE/CVF International Conference on Computer Vision Workshop (ICCVW), pages 4176–4185. IEEE.
  • Jung et al., (2021) Jung, Y.-J., Han, S.-H., and Choi, H.-J. (2021). Explaining CNN and RNN Using Selective Layer-Wise Relevance Propagation. IEEE Access, 9:18670–18681.
  • Kohlbrenner et al., (2020) Kohlbrenner, M., Bauer, A., Nakajima, S., Binder, A., Samek, W., and Lapuschkin, S. (2020). Towards best practice in explaining neural network decisions with LRP. In 2020 International Joint Conference on Neural Networks (IJCNN), pages 1–7. IEEE.
  • Lemanczyk et al., (2024) Lemanczyk, M. S., Bartoszewicz, J. M., and Renard, B. Y. (2024). Motif Interactions Affect Post-Hoc Interpretability of Genomic Convolutional Neural Networks. bioRxiv preprint bioRxiv:2024.02.15.580353.
  • Litjens et al., (2017) Litjens, G., Kooi, T., Bejnordi, B. E., Setio, A. A. A., Ciompi, F., Ghafoorian, M., Van Der Laak, J. A., Van Ginneken, B., and Sánchez, C. I. (2017). A survey on deep learning in medical image analysis. Medical image analysis, 42:60–88.
  • Lundberg and Lee, (2017) Lundberg, S. M. and Lee, S.-I. (2017). A Unified Approach to Interpreting Model Predictions. In Advances in Neural Information Processing Systems, volume 30. Curran Associates, Inc.
  • Montavon et al., (2019) Montavon, G., Binder, A., Lapuschkin, S., Samek, W., and Müller, K.-R. (2019). Layer-Wise Relevance Propagation: An Overview. Explainable AI: interpreting, explaining and visualizing deep learning, pages 193–209.
  • Montavon et al., (2018) Montavon, G., Samek, W., and Müller, K.-R. (2018). Methods for interpreting and understanding deep neural networks. Digital signal processing, 73:1–15.
  • Russakovsky et al., (2015) Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., Huang, Z., Karpathy, A., Khosla, A., Bernstein, M., Berg, A. C., and Fei-Fei, L. (2015). ImageNet Large Scale Visual Recognition Challenge. International Journal of Computer Vision (IJCV), 115(3):211–252.
  • Samek et al., (2016) Samek, W., Binder, A., Montavon, G., Lapuschkin, S., and Müller, K.-R. (2016). Evaluating the visualization of what a deep neural network has learned. IEEE transactions on neural networks and learning systems, 28(11):2660–2673.
  • Samek et al., (2021) Samek, W., Montavon, G., Lapuschkin, S., Anders, C. J., and Müller, K.-R. (2021). Explaining Deep Neural Networks and Beyond: A Review of Methods and Applications. Proceedings of the IEEE, 109(3):247–278.
  • Schnake et al., (2021) Schnake, T., Eberle, O., Lederer, J., Nakajima, S., Schütt, K. T., Müller, K.-R., and Montavon, G. (2021). Higher-order explanations of graph neural networks via relevant walks. IEEE transactions on pattern analysis and machine intelligence, 44(11):7581–7596.
  • Shi et al., (2015) Shi, J., Yan, Q., Xu, L., and Jia, J. (2015). Hierarchical image saliency detection on extended cssd. IEEE transactions on pattern analysis and machine intelligence, 38(4):717–729.
  • Simonyan and Zisserman, (2015) Simonyan, K. and Zisserman, A. (2015). Very Deep Convolutional Networks for Large-Scale Image Recognition. In International Conference on Learning Representations.
  • Sixt et al., (2020) Sixt, L., Granz, M., and Landgraf, T. (2020). When explanations lie: Why many modified bp attributions fail. In International Conference on Machine Learning, pages 9046–9057. PMLR.
  • Yeom et al., (2021) Yeom, S.-K., Seegerer, P., Lapuschkin, S., Binder, A., Wiedemann, S., Müller, K.-R., and Samek, W. (2021). Pruning by explaining: A novel criterion for deep neural network pruning. Pattern Recognition, 115:107899.
  • Zhu and Gupta, (2017) Zhu, M. and Gupta, S. (2017). To prune, or not to prune: exploring the efficacy of pruning for model compression. arXiv preprint arXiv:1710.01878.

Appendix

A

Refer to caption
Figure 6: Results for ImageNet dataset for metrics sparsity, faithfulness, and robustness for different proportions of pruned relevance mass p𝑝pitalic_p for models VGG16 and ResNet50. For sparsity, and faithfulness, the y-axis covers the whole output domain of [0,1]01[0,1][ 0 , 1 ]. The zoom plots focus on the actual covered output domain for better comparison of the methods.
Refer to caption
Figure 7: Results for ImageNet dataset for faithfulness for an exemplary parameterization p=0.15𝑝0.15p=0.15italic_p = 0.15 for models VGG16 (left) and ResNet50 (right). The prediction score fc(𝒙)subscript𝑓superscript𝑐𝒙f_{c^{*}}(\boldsymbol{x})italic_f start_POSTSUBSCRIPT italic_c start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ( bold_italic_x ) drops similarly steeply as for the LRP baseline for the features with the highest relevance that are perturbed first. The difference in AUC is rather driven by the less important features that are perturbed later.

B

Refer to caption
Figure 8: Results for genomics for metrics sparsityand localization for different proportions of pruned relevance mass p𝑝pitalic_p for for a CNN with 32 filters (a) and 4 filters (b). The y-axis covers the whole output domain of [0,1]01[0,1][ 0 , 1 ]. The zoom plots focus on the actual covered output domain for better comparison of the methods.