Book localization is crucial for the development of intelligent book inventory systems, where the high-precision detection of book spines is a critical requirement. However, the varying tilt angles and diverse aspect ratios of books on library shelves often reduce the effectiveness of conventional object detection algorithms. To address these challenges, this study proposes an enhanced oriented R-CNN algorithm for book spine detection. First, we replace the standard 3 × 3 convolutions in ResNet50's residual blocks with deformable convolutions to enhance the network's capacity for modeling the geometric deformations of book spines. Additionally, the PAFPN (Path Aggregation Feature Pyramid Network) was integrated into the neck structure to enhance multi-scale feature fusion. To further optimize the anchor box design, we introduce an adaptive initial cluster center selection method for K-median clustering. This allows for a more accurate computation of anchor box aspect ratios that are better aligned with the book spine dataset, enhancing the model's training performance. We conducted comparison experiments between the proposed model and other state-of-the-art models on the book spine dataset, and the results demonstrate that the proposed approach reaches an mAP of 90.22%, which outperforms the baseline algorithm by 4.47 percentage points. Our method significantly improves detection accuracy, making it highly effective for identifying book spines in real-world library environments.
Keywords: K-median clustering; book spine detection; deformable convolution; oriented R-CNN; secondary feature fusion.