Deep learning algorithms for melanoma detection using dermoscopic images: A systematic review and meta-analysis

Zichen Ye; Daqian Zhang; Yuankai Zhao; Mingyang Chen; Huike Wang; Samuel Seery; Yimin Qu; Peng Xue; Yu Jiang

doi:10.1016/j.artmed.2024.102934

Deep learning algorithms for melanoma detection using dermoscopic images: A systematic review and meta-analysis

Artif Intell Med. 2024 Sep:155:102934. doi: 10.1016/j.artmed.2024.102934. Epub 2024 Jul 25.

Authors

Zichen Ye¹, Daqian Zhang¹, Yuankai Zhao¹, Mingyang Chen¹, Huike Wang¹, Samuel Seery², Yimin Qu¹, Peng Xue³, Yu Jiang⁴

Affiliations

¹ School of Population Medicine and Public Health, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing, China.
² Population Health Sciences Institute, School of Pharmacy, Newcastle University, Newcastle NE1 7RU, United Kingdom of Great Britain and Northern Ireland.
³ School of Population Medicine and Public Health, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing, China. Electronic address: [email protected].
⁴ School of Population Medicine and Public Health, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing, China. Electronic address: [email protected].

PMID: 39088883
DOI: 10.1016/j.artmed.2024.102934

Abstract

Background: Melanoma is a serious risk to human health and early identification is vital for treatment success. Deep learning (DL) has the potential to detect cancer using imaging technologies and many studies provide evidence that DL algorithms can achieve high accuracy in melanoma diagnostics.

Objectives: To critically assess different DL performances in diagnosing melanoma using dermatoscopic images and discuss the relationship between dermatologists and DL.

Methods: Ovid-Medline, Embase, IEEE Xplore, and the Cochrane Library were systematically searched from inception until 7th December 2021. Studies that reported diagnostic DL model performances in detecting melanoma using dermatoscopic images were included if they had specific outcomes and histopathologic confirmation. Binary diagnostic accuracy data and contingency tables were extracted to analyze outcomes of interest, which included sensitivity (SEN), specificity (SPE), and area under the curve (AUC). Subgroup analyses were performed according to human-machine comparison and cooperation. The study was registered in PROSPERO, CRD42022367824.

Results: 2309 records were initially retrieved, of which 37 studies met our inclusion criteria, and 27 provided sufficient data for meta-analytical synthesis. The pooled SEN was 82 % (range 77-86), SPE was 87 % (range 84-90), with an AUC of 0.92 (range 0.89-0.94). Human-machine comparison had pooled AUCs of 0.87 (0.84-0.90) and 0.83 (0.79-0.86) for DL and dermatologists, respectively. Pooled AUCs were 0.90 (0.87-0.93), 0.80 (0.76-0.83), and 0.88 (0.85-0.91) for DL, and junior and senior dermatologists, respectively. Analyses of human-machine cooperation were 0.88 (0.85-0.91) for DL, 0.76 (0.72-0.79) for unassisted, and 0.87 (0.84-0.90) for DL-assisted dermatologists.

Conclusions: Evidence suggests that DL algorithms are as accurate as senior dermatologists in melanoma diagnostics. Therefore, DL could be used to support dermatologists in diagnostic decision-making. Although, further high-quality, large-scale multicenter studies are required to address the specific challenges associated with medical AI-based diagnostics.

Keywords: Deep learning; Human-machine comparison; Human-machine cooperation; Melanoma; Systematic review.

Publication types

Meta-Analysis
Systematic Review
Research Support, Non-U.S. Gov't

MeSH terms

Deep Learning*
Dermoscopy* / methods
Humans
Melanoma* / diagnosis
Melanoma* / pathology
Skin / diagnostic imaging
Skin / pathology
Skin Neoplasms* / diagnosis
Skin Neoplasms* / pathology