A Generalized Attention Mechanism to Enhance the Accuracy Performance of Neural Networks

Pengcheng Jiang; Ferrante Neri; Yu Xue; Ujjwal Maulik

doi:10.1142/S0129065724500631

A Generalized Attention Mechanism to Enhance the Accuracy Performance of Neural Networks

Int J Neural Syst. 2024 Dec;34(12):2450063. doi: 10.1142/S0129065724500631. Epub 2024 Aug 31.

Authors

Pengcheng Jiang¹, Ferrante Neri², Yu Xue¹, Ujjwal Maulik³

Affiliations

¹ School of Software, Nanjing University of Information Science and Technology, Nanjing 210044, P. R. China.
² NICE Research Group, School of Computer Science and Electronic Engineering, University of Surrey, Guildford GU2 7XS, UK.
³ Department of Computer Science and Engineering, Jadavpur University, Kolkata, India.

PMID: 39212940
DOI: 10.1142/S0129065724500631

Abstract

In many modern machine learning (ML) models, attention mechanisms (AMs) play a crucial role in processing data and identifying significant parts of the inputs, whether these are text or images. This selective focus enables subsequent stages of the model to achieve improved classification performance. Traditionally, AMs are applied as a preprocessing substructure before a neural network, such as in encoder/decoder architectures. In this paper, we extend the application of AMs to intermediate stages of data propagation within ML models. Specifically, we propose a generalized attention mechanism (GAM), which can be integrated before each layer of a neural network for classification tasks. The proposed GAM allows for at each layer/step of the ML architecture identification of the most relevant sections of the intermediate results. Our experimental results demonstrate that incorporating the proposed GAM into various ML models consistently enhances the accuracy of these models. This improvement is achieved with only a marginal increase in the number of parameters, which does not significantly affect the training time.

Keywords: Convolutional neural networks; deep learning; deep neural networks.

MeSH terms

Attention* / physiology
Humans
Machine Learning*
Neural Networks, Computer*