Binary Transformer Based on the Alignment and Correction of Distribution

Sensors (Basel). 2024 Dec 22;24(24):8190. doi: 10.3390/s24248190.

Abstract

Transformer is a powerful model widely used in artificial intelligence applications. It contains complex structures and has extremely high computational requirements that are not suitable for embedded intelligent sensors with limited computational resources. The binary quantization technology takes up less memory space and has a faster calculation speed; however, it is seldom studied for the lightweight transformer. Compared with full-precision networks, the key bottleneck lies in the distribution shift problem caused by the existing binary quantization methods. To tackle this problem, the feature distribution alignment operation in binarization is investigated. The median shift and mean restore is designed to ensure consistency between the binary feature distribution and the full-precision transformer. Then, a knowledge distillation architecture for distribution correction is developed, which has a teacher-student structure comprising a full-precision and binary transformer, to further rectify the feature distribution of the binary student network to ensure the completeness and accuracy of the data. Experimental results on the CIFAR10, CIFAR100, ImageNet-1k, and TinyImageNet datasets show the effectiveness of the proposed binary optimization model, which outperforms the previous state-of-the-art binarization mechanisms while maintaining the same computational complexity.

Keywords: binary transformer; distribution alignment; distribution correction; knowledge distillation.