Deep learning model for automated diagnosis of degenerative cervical spondylosis and altered spinal cord signal on MRI

Spine J. 2024 Sep 30:S1529-9430(24)01038-6. doi: 10.1016/j.spinee.2024.09.015. Online ahead of print.

Abstract

Background context: A deep learning (DL) model for degenerative cervical spondylosis on MRI could enhance reporting consistency and efficiency, addressing a significant global health issue.

Purpose: Create a DL model to detect and classify cervical cord signal abnormalities, spinal canal and neural foraminal stenosis.

Study design/setting: Retrospective study conducted from January 2013 to July 2021, excluding cases with instrumentation.

Patient sample: Overall, 504 MRI cervical spines were analyzed (504 patients, mean=58 years±13.7[SD]; 202 women) with 454 for training (90%) and 50 (10%) for internal testing. In addition, 100 MRI cervical spines were available for external testing (100 patients, mean=60 years±13.0[SD];26 women).

Outcome measures: Automated detection and classification of spinal canal stenosis, neural foraminal stenosis, and cord signal abnormality using the DL model. Recall(%), inter-rater agreement (Gwet's kappa), sensitivity, and specificity were calculated.

Methods: Utilizing axial T2-weighted gradient echo and sagittal T2-weighted images, a transformer-based DL model was trained on data labeled by an experienced musculoskeletal radiologist (12 years of experience). Internal testing involved data labeled in consensus by 2 musculoskeletal radiologists (reference standard, both with 12-years-experience), 2 subspecialist radiologists, and 2 in-training radiologists. External testing was performed.

Results: The DL model exhibited substantial agreement surpassing all readers in all classes for spinal canal (κ=0.78, p<.001 vs. κ range=0.57-0.70 for readers) and neural foraminal stenosis (κ=0.80, p<.001 vs. κ range=0.63-0.69 for readers) classification. The DL model's recall for cord signal abnormality (92.3%) was similar to all readers (range: 92.3-100.0%). Nearly perfect agreement was demonstrated for binary classification (grades 0/1 vs. 2/3) (κ=0.95, p<.001 for spinal canal; κ=0.90, p<.001 for neural foramina). External testing showed substantial agreement using all classes (κ=0.76, p<.001 for spinal canal; κ=0.66, p<.001 for neural foramina) and high recall for cord signal abnormality (91.9%). The DL model demonstrated high sensitivities (range:83.7%-92.4%) and specificities (range:87.8%-98.3%) on both internal and external datasets for spinal canal and neural foramina classification.

Conclusions: Our DL model for degenerative cervical spondylosis on MRI showed good performance, demonstrating substantial agreement with the reference standard. This tool could assist radiologists in improving the efficiency and consistency of MRI cervical spondylosis assessments in clinical practice.

Keywords: Convolutional neural networks; Deep learning; Degenerative cervical spondylosis; MRI; Neural foramina; Spinal canal; Spinal cord.