Binary Transformer Based on the Alignment and Correction of Distribution

Kaili Wang; Mingtao Wang; Zixin Wan; Tao Shen

doi:10.3390/s24248190

Binary Transformer Based on the Alignment and Correction of Distribution

Sensors (Basel). 2024 Dec 22;24(24):8190. doi: 10.3390/s24248190.

Authors

Kaili Wang^{1

2}, Mingtao Wang^{1

2}, Zixin Wan^{1

2}, Tao Shen^{1

3}

Affiliations

¹ Faculty of Information Engineering and Automation, Kunming University of Science and Technology, Kunming 650500, China.
² Yunnan Key Laboratory of Computer Techologies Application, Kunming 650500, China.
³ Yunnan Mechanical and Electrical Vocational and Technical College, Kunming 650500, China.

PMID: 39771925
DOI: 10.3390/s24248190

Abstract

Transformer is a powerful model widely used in artificial intelligence applications. It contains complex structures and has extremely high computational requirements that are not suitable for embedded intelligent sensors with limited computational resources. The binary quantization technology takes up less memory space and has a faster calculation speed; however, it is seldom studied for the lightweight transformer. Compared with full-precision networks, the key bottleneck lies in the distribution shift problem caused by the existing binary quantization methods. To tackle this problem, the feature distribution alignment operation in binarization is investigated. The median shift and mean restore is designed to ensure consistency between the binary feature distribution and the full-precision transformer. Then, a knowledge distillation architecture for distribution correction is developed, which has a teacher-student structure comprising a full-precision and binary transformer, to further rectify the feature distribution of the binary student network to ensure the completeness and accuracy of the data. Experimental results on the CIFAR10, CIFAR100, ImageNet-1k, and TinyImageNet datasets show the effectiveness of the proposed binary optimization model, which outperforms the previous state-of-the-art binarization mechanisms while maintaining the same computational complexity.

Keywords: binary transformer; distribution alignment; distribution correction; knowledge distillation.

Abstract

Grants and funding