Dual-YOLO Architecture from Infrared and Visible Images for Object Detection

Chun Bao; Jie Cao; Qun Hao; Yang Cheng; Yaqian Ning; Tianhua Zhao

doi:10.3390/s23062934

Dual-YOLO Architecture from Infrared and Visible Images for Object Detection

Sensors (Basel). 2023 Mar 8;23(6):2934. doi: 10.3390/s23062934.

Authors

Chun Bao¹, Jie Cao^{1

2}, Qun Hao^{1

2

3}, Yang Cheng^{1

2}, Yaqian Ning¹, Tianhua Zhao¹

Affiliations

¹ Bionic Robot Key Laboratory of Ministry of Education, School of Optics and Photonics, Beijing Institute of Technology, Beijing 100081, China.
² Yangtze Delta Region Academy, Beijing Institute of Technology, Jiaxing 314003, China.
³ School of Opto-Electronic Engineering, Changchun University of Science and Technology, Changchun 130022, China.

Abstract

With the development of infrared detection technology and the improvement of military remote sensing needs, infrared object detection networks with low false alarms and high detection accuracy have been a research focus. However, due to the lack of texture information, the false detection rate of infrared object detection is high, resulting in reduced object detection accuracy. To solve these problems, we propose an infrared object detection network named Dual-YOLO, which integrates visible image features. To ensure the speed of model detection, we choose the You Only Look Once v7 (YOLOv7) as the basic framework and design the infrared and visible images dual feature extraction channels. In addition, we develop attention fusion and fusion shuffle modules to reduce the detection error caused by redundant fusion feature information. Moreover, we introduce the Inception and SE modules to enhance the complementary characteristics of infrared and visible images. Furthermore, we design the fusion loss function to make the network converge fast during training. The experimental results show that the proposed Dual-YOLO network reaches 71.8% mean Average Precision (mAP) in the DroneVehicle remote sensing dataset and 73.2% mAP in the KAIST pedestrian dataset. The detection accuracy reaches 84.5% in the FLIR dataset. The proposed architecture is expected to be applied in the fields of military reconnaissance, unmanned driving, and public safety.

Keywords: attention fusion; dual-YOLO; fusion loss; fusion shuffle; infrared object detection.

Abstract

Grants and funding