Prospective Evaluation of Real-Time Artificial Intelligence for the Hill Classification of the Gastroesophageal Junction

Ioannis Kafetzis; Philipp Sodmann; Bianca-Elena Herghelegiu; Markus Brand; Wolfram G Zoller; Florian Seyfried; Karl-Hermann Fuchs; Alexander Meining; Alexander Hann

doi:10.1002/ueg2.12721

Prospective Evaluation of Real-Time Artificial Intelligence for the Hill Classification of the Gastroesophageal Junction

United European Gastroenterol J. 2024 Dec 12. doi: 10.1002/ueg2.12721. Online ahead of print.

Authors

Ioannis Kafetzis¹, Philipp Sodmann¹, Bianca-Elena Herghelegiu¹, Markus Brand¹, Wolfram G Zoller², Florian Seyfried³, Karl-Hermann Fuchs¹, Alexander Meining¹, Alexander Hann¹

Affiliations

¹ Interventional and Experimental Endoscopy (InExEn), Department of Internal Medicine 2, University Hospital Würzburg, Würzburg, Germany.
² Department of Internal Medicine and Gastroenterology, Katharinenhospital, Stuttgart, Germany.
³ Department of General, Visceral, Transplantation, Vascular, and Pediatric Surgery, Center of Operative Medicine (ZOM), University Hospital Würzburg, Würzburg, Germany.

PMID: 39668544
DOI: 10.1002/ueg2.12721

Abstract

Background: Assessment of the gastroesophageal junction (GEJ) is an integral part of gastroscopy; however, the absence of standardized reporting hinders consistency of examination documentation. The Hill classification offers a standardized approach for evaluating the GEJ. This study aims to compare the accuracy of an artificial intelligence (AI) system with that of physicians in classifying the GEJ according to Hill in a prospective, blinded, superiority trial.

Methods: Consecutive patients scheduled for gastroscopy with an intact GEJ were recruited during clinical routine from October 2023 to December 2023. Nine physicians (six experienced, three inexperienced) assessed the Hill grade, and the AI system operated in the background in real-time. The gold standard was determined by a majority vote of independent assessments by three expert endoscopists who did not participate in the study. The primary outcome was accuracy. Secondary outcomes were per-Hill grade analysis and result comparison for experienced and inexperienced endoscopists separately.

Results: In 131 analysed examinations the AI's accuracy of 84.7% (95% CI: 78.6-90.8) was significantly higher than 62.5% (95% CI: 54.2-71) of physicians (p < 0.01). The AI outperformed physicians in all but one cases in the per-Hill-class analysis. AI was significantly more accurate than inexperienced physicians (85% vs. 56%, p < 0.01) and in trend better than experienced physicians (84% vs. 69.6%, p = 0.07).

Conclusions: AI was significantly more accurate than examiners in assessing the Hill classification. This superior model performance can prove beneficial for endoscopists, especially those with limited experience.

Trial registration: ClinicalTrials.gov identifier: NCT06040723.

Keywords: AI; ConvNext; EndoMind; GEJ; accuracy; deep learning; esophagogastroduodenoscopy; gastroscopy.

Associated data

ClinicalTrials.gov/NCT06040723

Abstract

Associated data

Grants and funding