Knowledge Distillation-Enhanced Behavior Transformer for Decision-Making of Autonomous Driving

Rui Zhao; Yuze Fan; Yun Li; Dong Zhang; Fei Gao; Zhenhai Gao; Zhengcai Yang

doi:10.3390/s25010191

Knowledge Distillation-Enhanced Behavior Transformer for Decision-Making of Autonomous Driving

Sensors (Basel). 2025 Jan 1;25(1):191. doi: 10.3390/s25010191.

Authors

Rui Zhao¹, Yuze Fan¹, Yun Li², Dong Zhang³, Fei Gao^{1

4}, Zhenhai Gao^{1

4}, Zhengcai Yang⁵

Affiliations

¹ College of Automotive Engineering, Jilin University, Changchun 130025, China.
² Graduate School of Information and Science Technology, The University of Tokyo, Tokyo 113-8654, Japan.
³ Department of Mechanical and Aerospace Engineering, Brunel University London, Uxbridge UB8 3PH, UK.
⁴ National Key Laboratory of Automotive Chassis Integration and Bionics, Jilin University, Changchun 130025, China.
⁵ Key Laboratory of Automotive Power Train and Electronics, Hubei University of Automotive Technology, Shiyan 442002, China.

PMID: 39796987
DOI: 10.3390/s25010191

Abstract

Autonomous driving has demonstrated impressive driving capabilities, with behavior decision-making playing a crucial role as a bridge between perception and control. Imitation Learning (IL) and Reinforcement Learning (RL) have introduced innovative approaches to behavior decision-making in autonomous driving, but challenges remain. On one hand, RL's policy networks often lack sufficient reasoning ability to make optimal decisions in highly complex and stochastic environments. On the other hand, the complexity of these environments leads to low sample efficiency in RL, making it difficult to efficiently learn driving policies. To address these challenges, we propose an innovative Knowledge Distillation-Enhanced Behavior Transformer (KD-BeT) framework. Building on the successful application of Transformers in large language models, we introduce the Behavior Transformer as the policy network in RL, using observation-action history as input for sequential decision-making, thereby leveraging the Transformer's contextual reasoning capabilities. Using a teacher-student paradigm, we first train a small-capacity teacher model quickly and accurately through IL, then apply knowledge distillation to accelerate RL's training efficiency and performance. Simulation results demonstrate that KD-BeT maintains fast convergence and high asymptotic performance during training. In the CARLA NoCrash benchmark tests, KD-BeT outperforms other state-of-the-art methods in terms of traffic efficiency and driving safety, offering a novel solution for addressing real-world autonomous driving tasks.

Keywords: autonomous driving; behavior transformer; decision-making; imitation learning; knowledge distillation; reinforcement learning.

Abstract

Grants and funding