Feature attribution methods stand as a popular approach for explaining the decisions made by convolutional neural networks. Given their nature as local explainability tools, these methods fall short in providing a systematic evaluation of their global meaningfulness. This limitation often gives rise to confirmation bias, where explanations are crafted after the fact. Consequently, we conducted a systematic investigation of feature attribution methods within the realm of electrocardiogram time series, focusing on R-peak, T-wave, and P-wave. Using a simulated dataset with modifications limited to the R-peak and T-wave, we evaluated the performance of various feature attribution techniques across two CNN architectures and explainability frameworks. Extending our analysis to real-world data revealed that, while feature attribution maps effectively highlight significant regions, their clarity is lacking, even under the simulated ideal conditions, resulting in blurry representations.
Keywords: CNN; Electrocardiogram; Explainability; Machine Learning.