Measurement of laryngeal elevation by automated segmentation using Mask R-CNN

Medicine (Baltimore). 2021 Dec 23;100(51):e28112. doi: 10.1097/MD.0000000000028112.

Abstract

The methods of measuring laryngeal elevation during swallowing are time-consuming. We aimed to propose a quick-to-use neural network (NN) model for measuring laryngeal elevation quantitatively using anatomical structures auto-segmented by Mask region-based convolutional NN (R-CNN) in videofluoroscopic swallowing study. Twelve videofluoroscopic swallowing study video clips were collected. One researcher drew the anatomical structure, including the thyroid cartilage and vocal fold complex (TVC) on respective video frames. The dataset was split into 11 videos (4686 frames) for model development and one video (532 frames) for derived model testing. The validity of the trained model was evaluated using the intersection over the union. The mean intersections over union of the C1 spinous process and TVC were 0.73 ± 0.07 [0-0.88] and 0.43 ± 0.19 [0-0.79], respectively. The recall rates for the auto-segmentation of the TVC and C1 spinous process by the Mask R-CNN were 86.8% and 99.8%, respectively. Actual displacement of the larynx was calculated using the midpoint of the auto-segmented TVC and C1 spinous process and diagonal lengths of the C3 and C4 vertebral bodies on magnetic resonance imaging, which measured 35.1 mm. Mask R-CNN segmented the TVC with high accuracy. The proposed method measures laryngeal elevation using the midpoint of the TVC and C1 spinous process, auto-segmented by Mask R-CNN. Mask R-CNN auto-segmented the TVC with considerably high accuracy. Therefore, we can expect that the proposed method will quantitatively and quickly determine laryngeal elevation in clinical settings.

MeSH terms

  • Humans
  • Image Processing, Computer-Assisted
  • Larynx / diagnostic imaging*
  • Magnetic Resonance Imaging
  • Neural Networks, Computer*
  • Vocal Cords / diagnostic imaging