Evaluating the pathological and clinical implications of errors made by an artificial intelligence colon biopsy screening tool

BMJ Open Gastroenterol. 2025 Jan 6;12(1):e001649. doi: 10.1136/bmjgast-2024-001649.

Abstract

Objective: Artificial intelligence (AI) tools for histological diagnosis offer great potential to healthcare, yet failure to understand their clinical context is delaying adoption. IGUANA (Interpretable Gland-Graphs using a Neural Aggregator) is an AI algorithm that can effectively classify colonic biopsies into normal versus abnormal categories, designed to automatically report normal cases. We performed a retrospective pathological and clinical review of the errors made by IGUANA.

Methods: False negative (FN) errors were the primary focus due to the greatest propensity for harm. Pathological evaluation involved assessment of whole slide image (WSI) quality, precise diagnoses for each missed entity and identification of factors impeding diagnosis. Clinical evaluation scored the impact of each error on the patient and detailed the type of impact in terms of missed diagnosis, investigations or treatment.

Results: Across 5054 WSIs from 2080 UK National Health Service patients there were 220 FN errors across 164 cases (4.4% of WSI, 7.9% of cases). Diagnostic errors varied from cases of adenocarcinoma to mild inflammation. 88.4% of FN errors would have no impact on patient care, with only one error causing major patient harm. Factors that protected against harm included biopsies being low-risk polyps or diagnostic features were detected in other biopsies.

Conclusion: Most FN errors would not result in patient harm, suggesting that even with a 7.9% case-level error rate, this AI tool might be more suitable for adoption than statistics portray. Consideration of the clinical context of AI tool errors is essential to facilitate safe implementation.

Keywords: COLORECTAL DISEASES; COLORECTAL PATHOLOGY; COMPUTERISED IMAGE ANALYSIS.

MeSH terms

  • Adenocarcinoma / diagnosis
  • Adenocarcinoma / pathology
  • Adult
  • Aged
  • Algorithms
  • Artificial Intelligence*
  • Biopsy / methods
  • Biopsy / statistics & numerical data
  • Colon* / pathology
  • Colonoscopy / methods
  • Colonoscopy / statistics & numerical data
  • Diagnostic Errors* / prevention & control
  • Diagnostic Errors* / statistics & numerical data
  • Early Detection of Cancer / methods
  • False Negative Reactions
  • Female
  • Humans
  • Male
  • Middle Aged
  • Retrospective Studies
  • United Kingdom / epidemiology