Reproducible Reporting of the Collection and Evaluation of Annotations for Artificial Intelligence Models

Katherine Elfer; Emma Gardecki; Victor Garcia; Amy Ly; Evangelos Hytopoulos; Si Wen; Matthew G Hanna; Dieter J E Peeters; Joel Saltz; Anna Ehinger; Sarah N Dudgeon; Xiaoxian Li; Kim R M Blenman; Weijie Chen; Ursula Green; Ryan Birmingham; Tony Pan; Jochen K Lennerz; Roberto Salgado; Brandon D Gallas

doi:10.1016/j.modpat.2024.100439

Reproducible Reporting of the Collection and Evaluation of Annotations for Artificial Intelligence Models

Mod Pathol. 2024 Apr;37(4):100439. doi: 10.1016/j.modpat.2024.100439. Epub 2024 Jan 28.

Authors

Katherine Elfer¹, Emma Gardecki², Victor Garcia², Amy Ly³, Evangelos Hytopoulos⁴, Si Wen², Matthew G Hanna⁵, Dieter J E Peeters⁶, Joel Saltz⁷, Anna Ehinger⁸, Sarah N Dudgeon⁹, Xiaoxian Li¹⁰, Kim R M Blenman¹¹, Weijie Chen², Ursula Green¹², Ryan Birmingham¹³, Tony Pan¹², Jochen K Lennerz¹⁴, Roberto Salgado¹⁵, Brandon D Gallas²

Affiliations

¹ United States Food and Drug Administration, Center for Devices and Radiological Health, Office of Science and Engineering Laboratories, Division of Imaging Diagnostics and Software Reliability, Silver Spring, Maryland; National Institutes of Health, National Cancer Institute, Division of Cancer Prevention, Cancer Prevention Fellowship Program, Bethesda, Maryland. Electronic address: [email protected].
² United States Food and Drug Administration, Center for Devices and Radiological Health, Office of Science and Engineering Laboratories, Division of Imaging Diagnostics and Software Reliability, Silver Spring, Maryland.
³ Department of Pathology, Massachusetts General Hospital, Boston, Massachusetts.
⁴ System Development, iRhythm Technologies Inc, San Francisco, California.
⁵ Department of Pathology and Laboratory Medicine, Memorial Sloan Kettering Cancer Center, New York, New York.
⁶ Department of Pathology, University Hospital Antwerp/University of Antwerp, Antwerp, Belgium; Department of Pathology, Sint-Maarten Hospital, Mechelen, Belgium.
⁷ Department of Biomedical Informatics, Stony Brook University, Stony Brook, New York.
⁸ Department of Clinical Genetics, Pathology and Molecular Diagnostics, Laboratory Medicine, Lund University, Lund, Sweden.
⁹ Department of Laboratory Medicine, Yale School of Medicine, New Haven, Connecticut.
¹⁰ Department of Pathology and Laboratory Medicine, Emory University School of Medicine, Atlanta, Georgia.
¹¹ Department of Internal Medicine, Section of Medical Oncology, Yale School of Medicine and Yale Cancer Center, Yale University, New Haven, Connecticut; Department of Computer Science, School of Engineering and Applied Science, Yale University, New Haven, Connecticut.
¹² Department of Biomedical Informatics, Emory University School of Medicine, Atlanta, Georgia.
¹³ United States Food and Drug Administration, Center for Devices and Radiological Health, Office of Science and Engineering Laboratories, Division of Imaging Diagnostics and Software Reliability, Silver Spring, Maryland; Department of Biomedical Informatics, Emory University School of Medicine, Atlanta, Georgia.
¹⁴ Department of Pathology, Center for Integrated Diagnostics, Massachusetts General Hospital, Harvard Medical School, Boston, Massachusetts.
¹⁵ Division of Research, Peter Mac Callum Cancer Centre, Melbourne, Australia; Department of Pathology, GZA-ZNA Hospitals, Antwerp, Belgium.

PMID: 38286221
DOI: 10.1016/j.modpat.2024.100439

Abstract

This work puts forth and demonstrates the utility of a reporting framework for collecting and evaluating annotations of medical images used for training and testing artificial intelligence (AI) models in assisting detection and diagnosis. AI has unique reporting requirements, as shown by the AI extensions to the Consolidated Standards of Reporting Trials (CONSORT) and Standard Protocol Items: Recommendations for Interventional Trials (SPIRIT) checklists and the proposed AI extensions to the Standards for Reporting Diagnostic Accuracy (STARD) and Transparent Reporting of a Multivariable Prediction model for Individual Prognosis or Diagnosis (TRIPOD) checklists. AI for detection and/or diagnostic image analysis requires complete, reproducible, and transparent reporting of the annotations and metadata used in training and testing data sets. In an earlier work by other researchers, an annotation workflow and quality checklist for computational pathology annotations were proposed. In this manuscript, we operationalize this workflow into an evaluable quality checklist that applies to any reader-interpreted medical images, and we demonstrate its use for an annotation effort in digital pathology. We refer to this quality framework as the Collection and Evaluation of Annotations for Reproducible Reporting of Artificial Intelligence (CLEARR-AI).

Keywords: Annotation Study; Artificial Intelligence Validation; Data set; Reference Standard; Reproducible Research; digital pathology.

Published by Elsevier Inc.

MeSH terms

Artificial Intelligence*
Checklist*
Humans
Image Processing, Computer-Assisted
Prognosis
Research Design