A Distributed System Improves Inter-Observer and AI Concordance in Annotating Interstitial Fibrosis and Tubular Atrophy

Proc SPIE Int Soc Opt Eng. 2021 Feb:11603:116030V. doi: 10.1117/12.2581789. Epub 2021 Feb 15.

Abstract

Histologic examination of interstitial fibrosis and tubular atrophy (IFTA) is critical to determine the extent of irreversible kidney injury in renal disease. The current clinical standard involves pathologist's visual assessment of IFTA, which is prone to inter-observer variability. To address this diagnostic variability, we designed two case studies (CSs), including seven pathologists, using HistomicsTK- a distributed system developed by Kitware Inc. (Clifton Park, NY). Twenty-five whole slide images (WSIs) were classified into a training set of 21 and a validation set of four. The training set was composed of seven unique subsets, each provided to an individual pathologist along with four common WSIs from the validation set. In CS 1, all pathologists individually annotated IFTA in their respective slides. These annotations were then used to train a deep learning algorithm to computationally segment IFTA. In CS 2, manual and computational annotations from CS 1 were first reviewed by the annotators to improve concordance of IFTA annotation. Both the manual and computational annotation processes were then repeated as in CS1. The inter-observer concordance in the validation set was measured by Krippendorff's alpha (KA). The KA for the seven pathologists in CS1 was 0.62 with CI [0.57, 0.67], and after reviewing each other's annotations in CS2, 0.66 with CI [0.60, 0.72]. The respective CS1 and CS2 KA were 0.58 with CI [0.52, 0.64] and 0.63 with CI [0.56, 0.69] when including the deep learner as an eighth annotator. These results suggest that our designed annotation framework refines agreement of spatial annotation of IFTA and demonstrates a human-AI approach to significantly improve the development of computational models.