Aerial images can cover a wide area and capture rich scene information. These images are often taken from a high altitude and contain many small objects. It is difficult to detect small objects accurately because their features are not obvious and are susceptible to background interference. The CPDD-YOLOv8 is proposed to improve the performance of small object detection. Firstly, we propose the C2fGAM structure, which integrates the Global Attention Mechanism (GAM) into the C2f structure of the backbone so that the model can better understand the overall semantics of the images. Secondly, a detection layer named P2 is added to extract the shallow features. Thirdly, a new DSC2f structure is proposed, which uses Dynamic Snake Convolution (DSConv) to take the place of the first standard Conv of Bottleneck in the C2f structure, so that the model can adapt to different inputs more effectively. Finally, the Dynamic Head (DyHead), which integrates multiple attention mechanisms, is used in the head to assign different weights to different feature layers. To prove the effectiveness of the CPDD-YOLOv8, we carry out ablation and comparison experiments on the VisDrone2019 dataset. Ablation experiments show that all the improved and added modules in CPDD-YOLOv8 are effective. Comparative experiments suggest that the mAP of CPDD-YOLOv8 is higher than the other seven comparison models. The [email protected] of this model reaches 41%, which is 6.9% higher than that of YOLOv8. The CPDD-YOLOv8's small object detection rate is improved by 13.1%. The generalizability of the CPDD-YOLOv8 model is verified on the WiderPerson, VOC_MASK and SHWD datasets.
Keywords: C2fGAM; P2; DSC2f; Small object detection; YOLOv8.
© 2025. The Author(s).