Importance: Myopic maculopathy (MM) is a major cause of vision impairment globally. Artificial intelligence (AI) and deep learning (DL) algorithms for detecting MM from fundus images could potentially improve diagnosis and assist screening in a variety of health care settings.
Objectives: To evaluate DL algorithms for MM classification and segmentation and compare their performance with that of ophthalmologists.
Design, setting, and participants: The Myopic Maculopathy Analysis Challenge (MMAC) was an international competition to develop automated solutions for 3 tasks: (1) MM classification, (2) segmentation of MM plus lesions, and (3) spherical equivalent (SE) prediction. Participants were provided 3 subdatasets containing 2306, 294, and 2003 fundus images, respectively, with which to build algorithms. A group of 5 ophthalmologists evaluated the same test sets for tasks 1 and 2 to ascertain performance. Results from model ensembles, which combined outcomes from multiple algorithms submitted by MMAC participants, were compared with each individual submitted algorithm. This study was conducted from March 1, 2023, to March 30, 2024, and data were analyzed from January 15, 2024, to March 30, 2024.
Exposure: DL algorithms submitted as part of the MMAC competition or ophthalmologist interpretation.
Main outcomes and measures: MM classification was evaluated by quadratic-weighted κ (QWK), F1 score, sensitivity, and specificity. MM plus lesions segmentation was evaluated by dice similarity coefficient (DSC), and SE prediction was evaluated by R2 and mean absolute error (MAE).
Results: The 3 tasks were completed by 7, 4, and 4 teams, respectively. MM classification algorithms achieved a QWK range of 0.866 to 0.901, an F1 score range of 0.675 to 0.781, a sensitivity range of 0.667 to 0.778, and a specificity range of 0.931 to 0.945. MM plus lesions segmentation algorithms achieved a DSC range of 0.664 to 0.687 for lacquer cracks (LC), 0.579 to 0.673 for choroidal neovascularization, and 0.768 to 0.841 for Fuchs spot (FS). SE prediction algorithms achieved an R2 range of 0.791 to 0.874 and an MAE range of 0.708 to 0.943. Model ensemble results achieved the best performance compared to each submitted algorithms, and the model ensemble outperformed ophthalmologists at MM classification in sensitivity (0.801; 95% CI, 0.764-0.840 vs 0.727; 95% CI, 0.684-0.768; P = .006) and specificity (0.946; 95% CI, 0.939-0.954 vs 0.933; 95% CI, 0.925-0.941; P = .009), LC segmentation (DSC, 0.698; 95% CI, 0.649-0.745 vs DSC, 0.570; 95% CI, 0.515-0.625; P < .001), and FS segmentation (DSC, 0.863; 95% CI, 0.831-0.888 vs DSC, 0.790; 95% CI, 0.742-0.830; P < .001).
Conclusions and relevance: In this diagnostic study, 15 AI models for MM classification and segmentation on a public dataset made available for the MMAC competition were validated and evaluated, with some models achieving better diagnostic performance than ophthalmologists.