PSPN: Pseudo-Siamese Pyramid Network for multimodal emotion analysis

Cogn Neurodyn. 2024 Oct;18(5):2883-2896. doi: 10.1007/s11571-024-10123-y. Epub 2024 May 28.

Abstract

Emotion recognition plays an important role in human life and healthcare. The EEG has been extensively researched as an objective indicator of intense emotions. However, current existing methods lack sufficient analysis of shallow and deep EEG features. In addition, human emotions are complex and variable, making it difficult to comprehensively represent emotions using a single-modal signal. As a signal associated with gaze tracking and eye movement detection, Eye-related signals provide various forms of supplementary information for multimodal emotion analysis. Therefore, we propose a Pseudo-Siamese Pyramid Network (PSPN) for multimodal emotion analysis. The PSPN model employs a Depthwise Separable Convolutional Pyramid (DSCP) to extract and integrate intrinsic emotional features at various levels and scales from EEG signals. Simultaneously, we utilize a fully connected subnetwork to extract the external emotional features from eye-related signals. Finally, we introduce a Pseudo-Siamese network that integrates a flexible cross-modal dual-branch subnetwork to collaboratively utilize EEG emotional features and eye-related behavioral features, achieving consistency and complementarity in multimodal emotion recognition. For evaluation, we conducted experiments on the DEAP and SEED-IV public datasets. The experimental results demonstrate that multimodal fusion significantly improves the accuracy of emotion recognition compared to single-modal approaches. Our PSPN model achieved the best accuracy of 96.02% and 96.45% on the valence and arousal dimensions of the DEAP dataset, and 77.81% on the SEED-IV dataset, respectively. Our code link is: https://github.com/Yinyanyan003/PSPN.git.

Keywords: Depthwise Separable Convolution; EEG; Emotion recognition; Multimodal; Pseudo-Siamese network; Pyramid network.