Man against machine: diagnostic performance of a deep learning convolutional neural network for dermoscopic melanoma recognition in comparison to 58 dermatologists

H A Haenssle; C Fink; R Schneiderbauer; F Toberer; T Buhl; A Blum; A Kalloo; A Ben Hadj Hassen; L Thomas; A Enk; L Uhlmann; Reader study level-I and level-II Groups; Christina Alt; Monika Arenbergerova; Renato Bakos; Anne Baltzer; Ines Bertlich; Andreas Blum; Therezia Bokor-Billmann; Jonathan Bowling; Naira Braghiroli; Ralph Braun; Kristina Buder-Bakhaya; Timo Buhl; Horacio Cabo; Leo Cabrijan; Naciye Cevic; Anna Classen; David Deltgen; Christine Fink; Ivelina Georgieva; Lara-Elena Hakim-Meibodi; Susanne Hanner; Franziska Hartmann; Julia Hartmann; Georg Haus; Elti Hoxha; Raimonds Karls; Hiroshi Koga; Jürgen Kreusch; Aimilios Lallas; Pawel Majenka; Ash Marghoob; Cesare Massone; Lali Mekokishvili; Dominik Mestel; Volker Meyer; Anna Neuberger; Kari Nielsen; Margaret Oliviero; Riccardo Pampena; John Paoli; Erika Pawlik; Barbar Rao; Adriana Rendon; Teresa Russo; Ahmed Sadek; Kinga Samhaber; Roland Schneiderbauer; Anissa Schweizer; Ferdinand Toberer; Lukas Trennheuser; Lyobomira Vlahova; Alexander Wald; Julia Winkler; Priscila Wölbing; Iris Zalaudek

doi:10.1093/annonc/mdy166

Man against machine: diagnostic performance of a deep learning convolutional neural network for dermoscopic melanoma recognition in comparison to 58 dermatologists

Ann Oncol. 2018 Aug 1;29(8):1836-1842. doi: 10.1093/annonc/mdy166.

Authors

H A Haenssle¹, C Fink², R Schneiderbauer², F Toberer², T Buhl³, A Blum⁴, A Kalloo⁵, A Ben Hadj Hassen⁶, L Thomas⁷, A Enk², L Uhlmann⁸; Reader study level-I and level-II Groups; Christina Alt, Monika Arenbergerova, Renato Bakos, Anne Baltzer, Ines Bertlich, Andreas Blum, Therezia Bokor-Billmann, Jonathan Bowling, Naira Braghiroli, Ralph Braun, Kristina Buder-Bakhaya, Timo Buhl, Horacio Cabo, Leo Cabrijan, Naciye Cevic, Anna Classen, David Deltgen, Christine Fink, Ivelina Georgieva, Lara-Elena Hakim-Meibodi, Susanne Hanner, Franziska Hartmann, Julia Hartmann, Georg Haus, Elti Hoxha, Raimonds Karls, Hiroshi Koga, Jürgen Kreusch, Aimilios Lallas, Pawel Majenka, Ash Marghoob, Cesare Massone, Lali Mekokishvili, Dominik Mestel, Volker Meyer, Anna Neuberger, Kari Nielsen, Margaret Oliviero, Riccardo Pampena, John Paoli, Erika Pawlik, Barbar Rao, Adriana Rendon, Teresa Russo, Ahmed Sadek, Kinga Samhaber, Roland Schneiderbauer, Anissa Schweizer, Ferdinand Toberer, Lukas Trennheuser, Lyobomira Vlahova, Alexander Wald, Julia Winkler, Priscila Wölbing, Iris Zalaudek

Affiliations

¹ Department of Dermatology, University of Heidelberg, Heidelberg, Germany. Electronic address: [email protected].
² Department of Dermatology, University of Heidelberg, Heidelberg, Germany.
³ Department of Dermatology, University of Göttingen, Göttingen, Germany.
⁴ Office Based Clinic of Dermatology, Konstanz, Germany.
⁵ Dermatology Service, Department of Medicine, Memorial Sloan Kettering Cancer Center, New York, USA.
⁶ Faculty of Computer Science and Mathematics, University of Passau, Passau, Germany.
⁷ Department of Dermatology, Lyons Cancer Research Center, Lyon 1 University, Lyon, France.
⁸ Institute of Medical Biometry and Informatics, University of Heidelberg, Heidelberg, Germany.

PMID: 29846502
DOI: 10.1093/annonc/mdy166

Abstract

Background: Deep learning convolutional neural networks (CNN) may facilitate melanoma detection, but data comparing a CNN's diagnostic performance to larger groups of dermatologists are lacking.

Methods: Google's Inception v4 CNN architecture was trained and validated using dermoscopic images and corresponding diagnoses. In a comparative cross-sectional reader study a 100-image test-set was used (level-I: dermoscopy only; level-II: dermoscopy plus clinical information and images). Main outcome measures were sensitivity, specificity and area under the curve (AUC) of receiver operating characteristics (ROC) for diagnostic classification (dichotomous) of lesions by the CNN versus an international group of 58 dermatologists during level-I or -II of the reader study. Secondary end points included the dermatologists' diagnostic performance in their management decisions and differences in the diagnostic performance of dermatologists during level-I and -II of the reader study. Additionally, the CNN's performance was compared with the top-five algorithms of the 2016 International Symposium on Biomedical Imaging (ISBI) challenge.

Results: In level-I dermatologists achieved a mean (±standard deviation) sensitivity and specificity for lesion classification of 86.6% (±9.3%) and 71.3% (±11.2%), respectively. More clinical information (level-II) improved the sensitivity to 88.9% (±9.6%, P = 0.19) and specificity to 75.7% (±11.7%, P < 0.05). The CNN ROC curve revealed a higher specificity of 82.5% when compared with dermatologists in level-I (71.3%, P < 0.01) and level-II (75.7%, P < 0.01) at their sensitivities of 86.6% and 88.9%, respectively. The CNN ROC AUC was greater than the mean ROC area of dermatologists (0.86 versus 0.79, P < 0.01). The CNN scored results close to the top three algorithms of the ISBI 2016 challenge.

Conclusions: For the first time we compared a CNN's diagnostic performance with a large international group of 58 dermatologists, including 30 experts. Most dermatologists were outperformed by the CNN. Irrespective of any physicians' experience, they may benefit from assistance by a CNN's image classification.

Clinical trial number: This study was registered at the German Clinical Trial Register (DRKS-Study-ID: DRKS00013570; https://www.drks.de/drks_web/).

Publication types

Comparative Study
Observational Study
Validation Study

MeSH terms

Clinical Competence
Cross-Sectional Studies
Deep Learning*
Dermatologists / statistics & numerical data*
Dermoscopy
Humans
Image Processing, Computer-Assisted / methods*
Image Processing, Computer-Assisted / statistics & numerical data
International Cooperation
Melanoma / diagnostic imaging*
ROC Curve
Retrospective Studies
Skin / diagnostic imaging
Skin Neoplasms / diagnostic imaging*

Associated data

DRKS/DRKS00013570