Subspecialty-Level Deep Gray Matter Differential Diagnoses with Deep Learning and Bayesian Networks on Clinical Brain MRI: A Pilot Study

Radiol Artif Intell. 2020 Sep 23;2(5):e190146. doi: 10.1148/ryai.2020190146. eCollection 2020 Sep.

Abstract

Purpose: To develop and validate a system that could perform automated diagnosis of common and rare neurologic diseases involving deep gray matter on clinical brain MRI studies.

Materials and methods: In this retrospective study, multimodal brain MRI scans from 212 patients (mean age, 55 years ± 17 [standard deviation]; 113 women) with 35 neurologic diseases and normal brain MRI scans obtained between January 2008 and January 2018 were included (110 patients in the training set, 102 patients in the test set). MRI scans from 178 patients (mean age, 48 years ± 17; 106 women) were used to supplement training of the neural networks. Three-dimensional convolutional neural networks and atlas-based image processing were used for extraction of 11 imaging features. Expert-derived Bayesian networks incorporating domain knowledge were used for differential diagnosis generation. The performance of the artificial intelligence (AI) system was assessed by comparing diagnostic accuracy with that of radiologists of varying levels of specialization by using the generalized estimating equation with robust variance estimator for the top three differential diagnoses (T3DDx) and the correct top diagnosis (TDx), as well as with receiver operating characteristic analyses.

Results: In the held-out test set, the imaging pipeline detected 11 key features on brain MRI scans with 89% accuracy (sensitivity, 81%; specificity, 95%) relative to academic neuroradiologists. The Bayesian network, integrating imaging features with clinical information, had an accuracy of 85% for T3DDx and 64% for TDx, which was better than that of radiology residents (n = 4; 56% for T3DDx, 36% for TDx; P < .001 for both) and general radiologists (n = 2; 53% for T3DDx, 31% for TDx; P < .001 for both). The accuracy of the Bayesian network was better than that of neuroradiology fellows (n = 2) for T3DDx (72%; P = .003) but not for TDx (59%; P = .19) and was not different from that of academic neuroradiologists (n = 2; 84% T3DDx, 65% TDx; P > .09 for both).

Conclusion: A hybrid AI system was developed that simultaneously provides a quantitative assessment of disease burden, explainable intermediate imaging features, and a probabilistic differential diagnosis that performed at the level of academic neuroradiologists. This type of approach has the potential to improve clinical decision making for common and rare diseases.Supplemental material is available for this article.© RSNA, 2020.