Purpose: To assess whether the semisupervised natural language processing (NLP) of text from clinical radiology reports could provide useful automated diagnosis categorization for ground truth labeling to overcome manual labeling bottlenecks in the machine learning pipeline.
Materials and methods: In this retrospective study, 1503 text cardiac MRI reports from 2016 to 2019 were manually annotated for five diagnoses by clinicians: normal, dilated cardiomyopathy (DCM), hypertrophic cardiomyopathy, myocardial infarction (MI), and myocarditis. A semisupervised method that uses bidirectional encoder representations from transformers (BERT) pretrained on 1.14 million scientific publications was fine-tuned by using the manually extracted labels, with a report dataset split into groups of 801 for training, 302 for validation, and 400 for testing. The model's performance was compared with two traditional NLP models: a rule-based model and a support vector machine (SVM) model. The models' F1 scores and receiver operating characteristic curves were used to analyze performance.
Results: After 15 epochs, the F1 scores on the test set of 400 reports were as follows: normal, 84%; DCM, 79%; hypertrophic cardiomyopathy, 86%; MI, 91%; and myocarditis, 86%. The pooled F1 score and area under the receiver operating curve were 86% and 0.96, respectively. On the same test set, the BERT model had a higher performance than the rule-based model (F1 score, 42%) and SVM model (F1 score, 82%). Diagnosis categories classified by using the BERT model performed the labeling of 1000 MR images in 0.2 second.
Conclusion: The developed model used labels extracted from radiology reports to provide automated diagnosis categorization of MR images with a high level of performance.Keywords: Semisupervised Learning, Diagnosis/Classification/Application Domain, Named Entity Recognition, MRI Supplemental material is available for this article. © RSNA, 2021.
Keywords: Diagnosis/Classification/Application Domain; MRI; Named Entity Recognition; Semisupervised Learning.
2021 by the Radiological Society of North America, Inc.