Improving real-time detection of laryngeal lesions in endoscopic images using a decoupled super-resolution enhanced YOLO

Comput Methods Programs Biomed. 2024 Dec 13:260:108539. doi: 10.1016/j.cmpb.2024.108539. Online ahead of print.

Abstract

Background and objective: Laryngeal Cancer (LC) constitutes approximately one third of head and neck cancers. Detecting early-stage lesions in this anatomical region is crucial for achieving a high survival rate. However, it poses significant diagnostic challenges owing to the varied appearance of lesions and the need for precise characterization for appropriate clinical management. Conventional diagnostic approaches rely heavily on endoscopic examination, which often requires expert interpretation and may be limited by subjective assessment. Deep learning (DL) approaches offer promising opportunities for automating lesion detection, but their efficacy in handling multi-modal imaging data and accurately localizing small lesions remains a subject of investigation. Furthermore, the clinical domain may largely benefit from the deployment of efficient DL methods that can ensure equitable access to advanced technologies, regardless of the availability of resources that can often be limited. In this study, a DL-based approach, named SRE-YOLO, was introduced to provide real-time assistance to less-experienced personnel during laryngeal assessment, by automatically detecting lesions at different scales from endoscopic White Light (WL) and Narrow-Band Imaging (NBI) images.

Methods: During the training, the SRE-YOLO integrates a YOLOv8 nano (YOLOv8n) baseline with a Super-Resolution (SR) branch to enhance lesion detection. This last component is decoupled during inference to preserve the low computational demand of the YOLOv8n baseline. The evaluation was conducted on a multi-center dataset, encompassing diverse laryngeal pathologies and acquisition modalities.

Results: The SRE-YOLO method improved the Average Precision (AP@IoU=0.5) in lesion detection by 5% with respect to the YOLOv8n baseline, while maintaining the inference speed of 58.8 Frames Per Second (FPS). Comparative analyses against state-of-the-art DL methods highlighted the efficacy of the SRE-YOLO approach in balancing detection accuracy, computational efficiency, and real-time applicability.

Conclusions: This research underscores the potential of SRE-YOLO in developing efficient DL-driven decision support systems for real-time detection of laryngeal lesions at different scales from both WL and NBI endoscopic data.

Keywords: Decision support system; Deep learning; Endoscopy; Laryngeal lesions; Real-time assistance; Super resolution.