Synthetic aperture radar (SAR) automatic target recognition (ATR) is a crucial technique utilized in various scenarios of geoscience and remote sensing. Despite the remarkable success of convolutional neural networks (CNNs) in optical vision tasks, the application of CNNs in SAR ATR is still a challenging area due to the significant differences in the imaging mechanisms of SAR and optical images. This paper analytically addresses the cognitive gap of CNNs between optical and SAR images by leveraging multi-order interactions to measure their representation capacity. Furthermore, we propose a subjective evaluation strategy to compare human interactions with those of CNNs. Our findings reveal that CNNs operate differently for optical and SAR images. Specifically, for SAR images, CNNs' representation capacity is comparable to that of humans, as they can encode intermediate interactions better than simple and complex ones. In contrast, for optical images, CNNs excel at encoding simple and complex interactions, but not intermediate interactions.
Keywords: Interpretable CNN; Multi-order interaction; SAR imaging; Target recognition.
Copyright © 2023 Elsevier Ltd. All rights reserved.