Purpose: Fracture detection is one of the most commonly used and studied aspects of artificial intelligence (AI) in medicine. In this systematic review and meta-analysis, we aimed to summarize available literature and data regarding AI performance in fracture detection on plain radiographs and various factors affecting it.
Methods: We systematically reviewed studies evaluating AI algorithms in detecting bone fractures in plain radiographs, combined their performance using meta-analysis (a bivariate regression approach), and compared it with that of clinicians. We also analyzed the factors potentially affecting algorithm performance using meta-regression.
Results: Our analysis included 100 studies. In 83 studies with confusion matrices, AI algorithms showed a sensitivity of 91.43% and a specificity of 92.12% (Area under the summary receiver operator curve = 0.968). After adjustment and false discovery rate correction, tibia/fibula (excluding ankle) fractures were associated with higher (7.0%, p=0.004) AI sensitivity, while more recent publications (5.5%, p=0.003) and Xception architecture (6.6%, p<0.001) were associated with higher specificity. Clinicians and AI showed similar specificity in fracture identification, although AI leaned to higher sensitivity (7.6%, p=0.07). Radiologists, on the other hand, were more specific than AI overall and in several subgroups, and more sensitive to hip fractures before FDR correction.
Conclusions: Currently available AI aids could result in a significant improvement in care where radiologists are not readily available. Moreover, identifying factors affecting algorithm performance could guide AI development teams in their process of optimizing their products.
Copyright © 2024. Published by Elsevier Ltd.