Ecologically sustainable benchmarking of AI models for histopathology

Yu-Chia Lan; Martin Strauch; Pourya Pilva; Nikolas E J Schmitz; Alireza Vafaei Sadr; Leon Niggemeier; Huong Quynh Nguyen; David L Hölscher; Tri Q Nguyen; Jesper Kers; Roman D Bülow; Peter Boor

doi:10.1038/s41746-024-01397-x

Ecologically sustainable benchmarking of AI models for histopathology

NPJ Digit Med. 2024 Dec 24;7(1):378. doi: 10.1038/s41746-024-01397-x.

Authors

Yu-Chia Lan¹, Martin Strauch¹, Pourya Pilva¹, Nikolas E J Schmitz¹, Alireza Vafaei Sadr^{1

2}, Leon Niggemeier¹, Huong Quynh Nguyen¹, David L Hölscher^{1

3}, Tri Q Nguyen⁴, Jesper Kers^{5

6}, Roman D Bülow^#¹, Peter Boor^#^{7

8}

Affiliations

¹ Institute of Pathology, University Clinic Aachen, RWTH Aachen University, Aachen, Germany.
² Department of Public Health Sciences, College of Medicine, The Pennsylvania State University, Hershey, PA, USA.
³ Department of Nephrology and Clinical Immunology, University Hospital Aachen, RWTH University Aachen, Aachen, Germany.
⁴ Department of Pathology, University Medical Centre Utrecht, Utrecht, The Netherlands.
⁵ Department of Pathology, Amsterdam UMC, University of Amsterdam, Amsterdam, The Netherlands.
⁶ Department of Pathology, Leiden Transplant Center, Leiden University Medical Center, Leiden, The Netherlands.
⁷ Institute of Pathology, University Clinic Aachen, RWTH Aachen University, Aachen, Germany. [email protected].
⁸ Department of Nephrology and Clinical Immunology, University Hospital Aachen, RWTH University Aachen, Aachen, Germany. [email protected].

^# Contributed equally.

PMID: 39719527
DOI: 10.1038/s41746-024-01397-x

Abstract

Deep learning (DL) holds great promise to improve medical diagnostics, including pathology. Current DL research mainly focuses on performance. DL implementation potentially leads to environmental consequences but approaches for assessment of both performance and carbon footprint are missing. Here, we explored an approach for developing DL for pathology, which considers both diagnostic performance and carbon footprint, calculated as CO₂ or equivalent emissions (CO₂eq). We evaluated various DL architectures used in computational pathology, including a large foundation model, across two diagnostic tasks of low and high complexity. We proposed a metric termed 'environmentally sustainable performance' (ESPer), which quantitatively integrates performance and operational CO₂eq during training and inference. While some DL models showed comparable diagnostic performance, ESPer enabled prioritizing those with less carbon footprint. We also investigated how data reduction approaches can improve the ESPer of individual models. This study provides an approach facilitating the development of environmentally friendly, sustainable medical AI.

Abstract

Grants and funding