Multi-label classification of retinal disease via a novel vision transformer model

Dong Wang; Jian Lian; Wanzhen Jiao

doi:10.3389/fnins.2023.1290803

Multi-label classification of retinal disease via a novel vision transformer model

Front Neurosci. 2024 Jan 8:17:1290803. doi: 10.3389/fnins.2023.1290803. eCollection 2023.

Authors

Dong Wang¹, Jian Lian², Wanzhen Jiao³

Affiliations

¹ School of Information Science and Electrical Engineering, Shandong Jiaotong University, Jinan, China.
² School of Intelligence Engineering, Shandong Management University, Jinan, China.
³ Department of Ophthalmology, Shandong Provincial Hospital Affiliated to Shandong First Medical University, Jinan, China.

Abstract

Introduction: The precise identification of retinal disorders is of utmost importance in the prevention of both temporary and permanent visual impairment. Prior research has yielded encouraging results in the classification of retinal images pertaining to a specific retinal condition. In clinical practice, it is not uncommon for a single patient to present with multiple retinal disorders concurrently. Hence, the task of classifying retinal images into multiple labels remains a significant obstacle for existing methodologies, but its successful accomplishment would yield valuable insights into a diverse array of situations simultaneously.

Methods: This study presents a novel vision transformer architecture called retinal ViT, which incorporates the self-attention mechanism into the field of medical image analysis. To note that this study supposed to prove that the transformer-based models can achieve competitive performance comparing with the CNN-based models, hence the convolutional modules have been eliminated from the proposed model. The suggested model concludes with a multi-label classifier that utilizes a feed-forward network architecture. This classifier consists of two layers and employs a sigmoid activation function.

Results and discussion: The experimental findings provide evidence of the improved performance exhibited by the suggested model when compared to state-of-the-art approaches such as ResNet, VGG, DenseNet, and MobileNet, on the publicly available dataset ODIR-2019, and the proposed approach has outperformed the state-of-the-art algorithms in terms of Kappa, F1 score, AUC, and AVG.

Keywords: deep learning; machine vision; medical image analysis; multi-label classification; retinal image.

Grants and funding

The author(s) declare that financial support was received for the research, authorship, and/or publication of this article. This research was funded by the Natural Science Foundation of Shandong Province, grant number ZR2020MF133.