Reproducible Reporting of the Collection and Evaluation of Annotations for Artificial Intelligence Models

Mod Pathol. 2024 Apr;37(4):100439. doi: 10.1016/j.modpat.2024.100439. Epub 2024 Jan 28.

Abstract

This work puts forth and demonstrates the utility of a reporting framework for collecting and evaluating annotations of medical images used for training and testing artificial intelligence (AI) models in assisting detection and diagnosis. AI has unique reporting requirements, as shown by the AI extensions to the Consolidated Standards of Reporting Trials (CONSORT) and Standard Protocol Items: Recommendations for Interventional Trials (SPIRIT) checklists and the proposed AI extensions to the Standards for Reporting Diagnostic Accuracy (STARD) and Transparent Reporting of a Multivariable Prediction model for Individual Prognosis or Diagnosis (TRIPOD) checklists. AI for detection and/or diagnostic image analysis requires complete, reproducible, and transparent reporting of the annotations and metadata used in training and testing data sets. In an earlier work by other researchers, an annotation workflow and quality checklist for computational pathology annotations were proposed. In this manuscript, we operationalize this workflow into an evaluable quality checklist that applies to any reader-interpreted medical images, and we demonstrate its use for an annotation effort in digital pathology. We refer to this quality framework as the Collection and Evaluation of Annotations for Reproducible Reporting of Artificial Intelligence (CLEARR-AI).

Keywords: Annotation Study; Artificial Intelligence Validation; Data set; Reference Standard; Reproducible Research; digital pathology.

MeSH terms

  • Artificial Intelligence*
  • Checklist*
  • Humans
  • Image Processing, Computer-Assisted
  • Prognosis
  • Research Design