Background: Assessment of the gastroesophageal junction (GEJ) is an integral part of gastroscopy; however, the absence of standardized reporting hinders consistency of examination documentation. The Hill classification offers a standardized approach for evaluating the GEJ. This study aims to compare the accuracy of an artificial intelligence (AI) system with that of physicians in classifying the GEJ according to Hill in a prospective, blinded, superiority trial.
Methods: Consecutive patients scheduled for gastroscopy with an intact GEJ were recruited during clinical routine from October 2023 to December 2023. Nine physicians (six experienced, three inexperienced) assessed the Hill grade, and the AI system operated in the background in real-time. The gold standard was determined by a majority vote of independent assessments by three expert endoscopists who did not participate in the study. The primary outcome was accuracy. Secondary outcomes were per-Hill grade analysis and result comparison for experienced and inexperienced endoscopists separately.
Results: In 131 analysed examinations the AI's accuracy of 84.7% (95% CI: 78.6-90.8) was significantly higher than 62.5% (95% CI: 54.2-71) of physicians (p < 0.01). The AI outperformed physicians in all but one cases in the per-Hill-class analysis. AI was significantly more accurate than inexperienced physicians (85% vs. 56%, p < 0.01) and in trend better than experienced physicians (84% vs. 69.6%, p = 0.07).
Conclusions: AI was significantly more accurate than examiners in assessing the Hill classification. This superior model performance can prove beneficial for endoscopists, especially those with limited experience.
Trial registration: ClinicalTrials.gov identifier: NCT06040723.
Keywords: AI; ConvNext; EndoMind; GEJ; accuracy; deep learning; esophagogastroduodenoscopy; gastroscopy.
© 2024 The Author(s). United European Gastroenterology Journal published by Wiley Periodicals LLC on behalf of United European Gastroenterology.