Combining the Variational and Deep Learning Techniques for Classification of Video Capsule Endoscopic Images

Bhavana Singh; Pushpendra Kumar; Shailendra Kumar Jain

doi:10.1007/s10278-024-01352-y

Combining the Variational and Deep Learning Techniques for Classification of Video Capsule Endoscopic Images

J Imaging Inform Med. 2025 Jan 3. doi: 10.1007/s10278-024-01352-y. Online ahead of print.

Authors

Bhavana Singh¹, Pushpendra Kumar², Shailendra Kumar Jain³

Affiliations

¹ Department of Mathematics, Bioinformatics & Computer Applications, Maulana Azad National Institute of Technology, Bhopal, 462003, India. [email protected].
² Department of Mathematics, Bioinformatics & Computer Applications, Maulana Azad National Institute of Technology, Bhopal, 462003, India.
³ Gastroenterology Department of Gandhi Medical College, Bhopal, 462003, India.

PMID: 39753827
DOI: 10.1007/s10278-024-01352-y

Abstract

Gastrointestinal tract-related cancers pose a significant health burden, with high mortality rates. In order to detect the anomalies of the gastrointestinal tract that may progress to cancer, a video capsule endoscopy procedure is employed. The number of video capsule endoscopic ( $VCE$ ) images produced per examination is enormous, which necessitates hours of analysis by clinicians. Therefore, there is a pressing need for automated computer-aided lesion classification techniques. Computer-aided systems utilize deep learning (DL) techniques, as they can potentially enhance anomaly detection rates. However, most of the DL techniques available in the literature utilizes the static frames for the classification purpose, which uses only the spatial information of the image. In addition, they only perform binary classification. Thus, the presented work proposes a framework to perform multi-class classification of $VCE$ images by using the dynamic information of the images. The proposed algorithm is a combination of the fractional order variational model and the DL model. The fractional order variational model captures the dynamic information of $VCE$ images by estimating optical flow color maps. Optical flow color maps are fed to the DL model for training. The DL model performs the multi-class classification task and localizes the region of interest with the maximum class score. DL model is inspired by the Faster RCNN approach, and its backbone architecture is EfficientNet B0. The proposed framework achieves the average AUC value of 0.98, mAP value of 0.93, and 0.878 as balanced accuracy value. Hence, the proposed model is efficient in $VCE$ image classification and detection of region of interest.

Keywords: Faster RCNN; Gastrointestinal tract; Multi-class classification; Optical flow; Video capsule endoscopy.