Background: Acute abdominal pain (AAP) constitutes 5-10% of all emergency department (ED) visits, with appendicitis being a prevalent AAP etiology often necessitating surgical intervention. The variability in AAP symptoms and causes, combined with the challenge of identifying appendicitis, complicate timely intervention. To estimate the risk of appendicitis, scoring systems such as the Alvarado score have been developed. However, diagnostic errors and delays remain common. Although various machine learning (ML) models have been proposed to enhance appendicitis detection, none have been seamlessly integrated into the ED workflows for AAP or are specifically designed to diagnose appendicitis as early as possible within the clinical decision-making process. To mimic daily clinical practice, this proof-of-concept study aims to develop ML models that support decision-making using comprehensive clinical data up to key decision points in the ED workflow to detect appendicitis in patients presenting with AAP.
Methods: Data from the Dutch triage system at the ED, vital signs, complete medical history and physical examination findings and routine laboratory test results were retrospectively extracted from 350 AAP patients presenting to the ED of a Dutch teaching hospital from 2016 to 2023. Two eXtreme Gradient Boosting ML models were developed to differentiate cases with appendicitis from other AAP causes: one model used all data up to and including physical examination, and the other was extended with routine laboratory test results. The performance of both models was evaluated on a validation set (n = 68) and compared to the Alvarado scoring system as well as three ED physicians in a reader study.
Results: The ML models achieved AUROCs of 0.919 without laboratory test results and 0.923 with the addition of laboratory test results. The Alvarado scoring system attained an AUROC of 0.824. ED physicians achieved AUROCs of 0.894, 0.826, and 0.791 without laboratory test results, increasing to AUROCs of 0.923, 0.892, and 0.859 with laboratory test results.
Conclusions: Both ML models demonstrated comparable high accuracy in predicting appendicitis in patients with AAP, outperforming the Alvarado scoring system. The ML models matched or surpassed ED physician performance in detecting appendicitis, with the largest potential performance gain observed in absence of laboratory test results. Integration could assist ED physicians in early and accurate diagnosis of appendicitis.
Keywords: Acute abdominal pain; Appendicitis; Artificial intelligence; Clinical decision support; Diagnostic follow-up; Emergency department; Machine learning.
© 2024. The Author(s).