A non-local dual-stream fusion network for laryngoscope recognition

Am J Otolaryngol. 2024 Dec 17;46(1):104565. doi: 10.1016/j.amjoto.2024.104565. Online ahead of print.

Abstract

Purpose: To use deep learning technology to design and implement a model that can automatically classify laryngoscope images and assist doctors in diagnosing laryngeal diseases.

Materials and methods: The experiment was based on 3057 images (normal, glottic cancer, granuloma, Reinke's Edema, vocal cord cyst, leukoplakia, nodules and polyps) from the dataset Laryngoscope8. A classification model based on deep neural networks was developed and tested. Model performance was verified by a variety of evaluation measures, including accuracy, recall, specificity, F1-Score and area under the receiver operating characteristic curve. In addition, the Grad-Cam technology was used to visualize the feature map of the model to improve the interpretation of the network.

Results: The model has high classification accuracy and robustness, and can accurately classify various types of laryngoscope images. In the test set of independent individuals, the overall accuracy reaches 86.51 %, and the average area under curve value is 0.954. The performance of the model is significantly better than other existing algorithms.

Conclusion: This paper proposes a deep learning based automatic classification model for laryngoscope images. By integrating the output features of deep neural network ResNet and Transformer, eight laryngeal diseases can be accurately classified. This indicates that the proposed method can be effectively applied to the study of laryngeal diseases.

Keywords: Computer aided diagnosis; Deep learning; Laryngeal diseases; Laryngoscope image; Transformer.