Neurons are the fundamental units of neural networks. In this paper, we propose a method for explaining neural networks by visualizing the learning process of neurons. For a trained neural network, the proposed method obtains the features learned by each neuron and displays the features in a human-understandable form. The features learned by different neurons are combined to analyze the working mechanism of different neural network models. The method is applicable to neural networks without requiring any changes to the architectures of the models. In this study, we apply the proposed method to both Fully Connected Networks (FCNs) and Convolutional Neural Networks (CNNs) trained using the backpropagation learning algorithm. We conduct experiments on models for image classification tasks to demonstrate the effectiveness of the method. Through these experiments, we gain insights into the working mechanisms of various neural network architectures and evaluate neural network interpretability from diverse perspectives.
Keywords: Interpretability; Neural network; Visualization.
Copyright © 2023 Elsevier Ltd. All rights reserved.