A video based benchmark data set (ENDOTEST) to evaluate computer-aided polyp detection systems

Scand J Gastroenterol. 2022 Nov;57(11):1397-1403. doi: 10.1080/00365521.2022.2085059. Epub 2022 Jun 14.

Abstract

Background and aims: Computer-aided polyp detection (CADe) may become a standard for polyp detection during colonoscopy. Several systems are already commercially available. We report on a video-based benchmark technique for the first preclinical assessment of such systems before comparative randomized trials are to be undertaken. Additionally, we compare a commercially available CADe system with our newly developed one.

Methods: ENDOTEST consisted in the combination of two datasets. The validation dataset contained 48 video-snippets with 22,856 manually annotated images of which 53.2% contained polyps. The performance dataset contained 10 full-length screening colonoscopies with 230,898 manually annotated images of which 15.8% contained a polyp. Assessment parameters were accuracy for polyp detection and time delay to first polyp detection after polyp appearance (FDT). Two CADe systems were assessed: a commercial CADe system (GI-Genius, Medtronic), and a self-developed new system (ENDOMIND). The latter being a convolutional neuronal network trained on 194,983 manually labeled images extracted from colonoscopy videos recorded in mainly six different gastroenterologic practices.

Results: On the ENDOTEST, both CADe systems detected all polyps in at least one image. The per-frame sensitivity and specificity in full colonoscopies was 48.1% and 93.7%, respectively for GI-Genius; and 54% and 92.7%, respectively for ENDOMIND. Median FDT of ENDOMIND with 217 ms (Inter-Quartile Range(IQR)8-1533) was significantly faster than GI-Genius with 1050 ms (IQR 358-2767, p = 0.003).

Conclusions: Our benchmark ENDOTEST may be helpful for preclinical testing of new CADe devices. There seems to be a correlation between a shorter FDT with a higher sensitivity and a lower specificity for polyp detection.

Keywords: CADe; Colonoscopy; artificial intelligence; deep learning; polyp.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Benchmarking
  • Colonic Polyps* / diagnostic imaging
  • Colonoscopy / methods
  • Humans
  • Mass Screening