Artificial intelligence diagnostic accuracy in fracture detection from plain radiographs and comparing it with clinicians: a systematic review and meta-analysis

A Nowroozi; M A Salehi; P Shobeiri; S Agahi; S Momtazmanesh; P Kaviani; M K Kalra

doi:10.1016/j.crad.2024.04.009

Artificial intelligence diagnostic accuracy in fracture detection from plain radiographs and comparing it with clinicians: a systematic review and meta-analysis

Clin Radiol. 2024 Aug;79(8):579-588. doi: 10.1016/j.crad.2024.04.009. Epub 2024 Apr 20.

Authors

A Nowroozi¹, M A Salehi¹, P Shobeiri¹, S Agahi¹, S Momtazmanesh¹, P Kaviani², M K Kalra³

Affiliations

¹ School of Medicine, Tehran University of Medical Sciences, Tehran, Iran.
² Department of Radiology, Massachusetts General Hospital and Harvard Medical School, Boston, MA 02114, USA.
³ Department of Radiology, Massachusetts General Hospital and Harvard Medical School, Boston, MA 02114, USA. Electronic address: [email protected].

PMID: 38772766
DOI: 10.1016/j.crad.2024.04.009

Abstract

Purpose: Fracture detection is one of the most commonly used and studied aspects of artificial intelligence (AI) in medicine. In this systematic review and meta-analysis, we aimed to summarize available literature and data regarding AI performance in fracture detection on plain radiographs and various factors affecting it.

Methods: We systematically reviewed studies evaluating AI algorithms in detecting bone fractures in plain radiographs, combined their performance using meta-analysis (a bivariate regression approach), and compared it with that of clinicians. We also analyzed the factors potentially affecting algorithm performance using meta-regression.

Results: Our analysis included 100 studies. In 83 studies with confusion matrices, AI algorithms showed a sensitivity of 91.43% and a specificity of 92.12% (Area under the summary receiver operator curve = 0.968). After adjustment and false discovery rate correction, tibia/fibula (excluding ankle) fractures were associated with higher (7.0%, p=0.004) AI sensitivity, while more recent publications (5.5%, p=0.003) and Xception architecture (6.6%, p<0.001) were associated with higher specificity. Clinicians and AI showed similar specificity in fracture identification, although AI leaned to higher sensitivity (7.6%, p=0.07). Radiologists, on the other hand, were more specific than AI overall and in several subgroups, and more sensitive to hip fractures before FDR correction.

Conclusions: Currently available AI aids could result in a significant improvement in care where radiologists are not readily available. Moreover, identifying factors affecting algorithm performance could guide AI development teams in their process of optimizing their products.

Publication types

Systematic Review
Meta-Analysis
Comparative Study

MeSH terms

Algorithms
Artificial Intelligence*
Fractures, Bone* / diagnostic imaging
Humans
Radiographic Image Interpretation, Computer-Assisted / methods
Reproducibility of Results
Sensitivity and Specificity*