Performance of the Winning Algorithms of the RSNA 2022 Cervical Spine Fracture Detection Challenge

Ghee Rye Lee; Adam E Flanders; Tyler Richards; Felipe Kitamura; Errol Colak; Hui Ming Lin; Robyn L Ball; Jason Talbott; Luciano M Prevedello

doi:10.1148/ryai.230256

Performance of the Winning Algorithms of the RSNA 2022 Cervical Spine Fracture Detection Challenge

Radiol Artif Intell. 2024 Jan;6(1):e230256. doi: 10.1148/ryai.230256.

Authors

Ghee Rye Lee¹, Adam E Flanders¹, Tyler Richards¹, Felipe Kitamura¹, Errol Colak¹, Hui Ming Lin¹, Robyn L Ball¹, Jason Talbott¹, Luciano M Prevedello¹

Affiliation

¹ From the Department of Radiology, Ohio State University Wexner Medical Center, 395 W 12th Ave, Columbus, OH 43210 (G.R.L., L.M.P.); Department of Radiology, Thomas Jefferson University, Philadelphia, Pa (A.E.F.); Department of Radiology, University of Utah School of Medicine, Salt Lake City, Utah (T.R.); Dasalnova, Diagnósticos da América, São Paulo, Brazil (F.K.); Department of Diagnostic Imaging, Universidade Federal de São Paulo, São Paulo, Brazil (F.K.); Department of Medical Imaging, Unity Health Toronto, University of Toronto, Toronto, Canada (E.C., H.M.L.); The Jackson Laboratory, Bar Harbor, Maine (R.L.B.); and Department of Radiology and Biomedical Imaging, University of California San Francisco, San Francisco, Calif (J.T.).

Abstract

Purpose To evaluate and report the performance of the winning algorithms of the Radiological Society of North America Cervical Spine Fracture AI Challenge. Materials and Methods The competition was open to the public on Kaggle from July 28 to October 27, 2022. A sample of 3112 CT scans with and without cervical spine fractures (CSFx) were assembled from multiple sites (12 institutions across six continents) and prepared for the competition. The test set had 1093 scans (private test set: n = 789; mean age, 53.40 years ± 22.86 [SD]; 509 males; public test set: n = 304; mean age, 52.51 years ± 20.73; 189 males) and 847 fractures. The eight top-performing artificial intelligence (AI) algorithms were retrospectively evaluated, and the area under the receiver operating characteristic curve (AUC) value, F1 score, sensitivity, and specificity were calculated. Results A total of 1108 contestants composing 883 teams worldwide participated in the competition. The top eight AI models showed high performance, with a mean AUC value of 0.96 (95% CI: 0.95, 0.96), mean F1 score of 90% (95% CI: 90%, 91%), mean sensitivity of 88% (95% Cl: 86%, 90%), and mean specificity of 94% (95% CI: 93%, 96%). The highest values reported for previous models were an AUC of 0.85, F1 score of 81%, sensitivity of 76%, and specificity of 97%. Conclusion The competition successfully facilitated the development of AI models that could detect and localize CSFx on CT scans with high performance outcomes, which appear to exceed known values of previously reported models. Further study is needed to evaluate the generalizability of these models in a clinical environment. Keywords: Cervical Spine, Fracture Detection, Machine Learning, Artificial Intelligence Algorithms, CT, Head/Neck Supplemental material is available for this article. © RSNA, 2024.

Keywords: Artificial Intelligence Algorithms; CT; Cervical Spine; Fracture Detection; Head/Neck; Machine Learning.

MeSH terms

Algorithms
Artificial Intelligence
Cervical Vertebrae / diagnostic imaging
Fractures, Bone*
Humans
Male
Middle Aged
Retrospective Studies
Spinal Fractures* / diagnosis