Background: Estimating the extent of affected skin is an important unmet clinical need both for research and practical management in many diseases. In particular, cutaneous burden of chronic graft-vs-host disease (cGVHD) is a primary outcome in many trials. Despite advances in artificial intelligence and 3D photography, progress toward reliable automated techniques is hindered by limited expert time to delineate cGVHD patient images. Crowdsourcing may have potential to provide the requisite expert-level data.
Materials and methods: Forty-one three-dimensional photographs of three cutaneous cGVHD patients were delineated by a board-certified dermatologist. 410 two-dimensional projections of the raw photos were each annotated by seven crowd workers, whose consensus performance was compared to the expert.
Results: The consensus delineation by four of seven crowd workers achieved the highest agreement with the expert, measured by a median Dice index of 0.7551 across all 410 images, outperforming even the best worker from the crowd (Dice index 0.7216). For their internal agreement, crowd workers achieved a median Fleiss's kappa of 0.4140 across the images. The time a worker spent marking an image had only weak correlation with the surface area marked, and very low correlation with accuracy. Percent of pixels selected by the consensus exhibited good correlation (Pearson R = 0.81) with the patient's affected surface area.
Conclusion: Crowdsourcing may be an efficient method for obtaining demarcations of affected skin, on par with expert performance. Crowdsourced data generally agreed with the current clinical standard of percent body surface area to assess cGVHD severity in the skin.
Keywords: body surface area; crowdsourcing; graft-vs-host disease; machine learning; photography; stem cell transplantation; three-dimensional imaging.
Published 2019. This article is a U.S. Government work and is in the public domain in the USA.