Purpose: Malignant gliomas represent an aggressive class of central nervous system neoplasms. Correlation of interventional outcomes with tumor morphometry data necessitates 3D segmentation of tumors (typically based on magnetic resonance imaging). Expert delineation is the long-held gold standard for tumor segmentation, but is exceptionally resource intensive and subject to intrarater and inter-rater variability. Automated tumor segmentation algorithms have been demonstrated for a variety of imaging modalities and tumor phenotypes, but translation of these methods across clinical study designs is problematic given variation in image acquisition, tumor characteristics, segmentation objectives, and validation criteria. Herein, the authors demonstrate an alternative approach for high-throughput tumor segmentation using Internet-based, collaborative labeling.
Methods: In a study of 85 human raters and 98 tumor patients, raters were recruited from a general university campus population (i.e., no specific medical knowledge), given minimal training, and provided web-based tools to label MRI images based on 2D cross sections. The labeling goal was characterized as to extract the enhanced tumor cores on T1-weighted MRI and the bright abnormality on T2-weighted MRI. An experienced rater manually constructed the ground truth volumes of a randomly sampled subcohort of 48 tumor subjects (for both T1w and T2w). Raters' taskwise individual observations, as well as the volume wise truth estimates via statistical fusion method, were evaluated over the subjects having the ground truth.
Results: Individual raters were able to reliably characterize (with >0.8 dice similarity coefficient, DSC) the gadolinium-enhancing cores and extent of the edematous areas only slightly more than half of the time. Yet, human raters were efficient in terms of providing these highly variable segmentations (less than 20 s per slice). When statistical fusion was used to combine the results of seven raters per slice for all slices in the datasets, the 3D agreement of the fused results with expertly delineated segmentations was on par with the inter-rater reliability observed between experienced raters using traditional 3D tools (approximately 0.85 DSC). The cumulative time spent per tumor patient with the collaborative approach was equivalent to that with an experienced rater, but the collaborative approach could be achieved with less training time, fewer resources, and efficient parallelization.
Conclusions: Hence, collaborative labeling is a promising technique with potentially wide applicability to cost-effective manual labeling of medical images.