Automatic Normalization of Anatomical Phrases in Radiology Reports Using Unsupervised Learning

Amir M Tahmasebi; Henghui Zhu; Gabriel Mankovich; Peter Prinsen; Prescott Klassen; Sam Pilato; Rob van Ommering; Pritesh Patel; Martin L Gunn; Paul Chang

doi:10.1007/s10278-018-0116-5

Automatic Normalization of Anatomical Phrases in Radiology Reports Using Unsupervised Learning

J Digit Imaging. 2019 Feb;32(1):6-18. doi: 10.1007/s10278-018-0116-5.

Authors

Affiliations

¹ Philips Research North America, 2 Canal Park, 3rd Floor, Cambridge, MA, 02141, USA. [email protected].
² Division of Systems Engineering, Boston University, Brookline, MA, USA.
³ Philips Research North America, 2 Canal Park, 3rd Floor, Cambridge, MA, 02141, USA.
⁴ Philips Research, Eindhoven, North Brabant, The Netherlands.
⁵ Department of Radiology, University of Chicago Medical Center, Chicago, IL, USA.
⁶ Department of Radiology, University of Washington, Seattle, WA, USA.

Abstract

In today's radiology workflow, free-text reporting is established as the most common medium to capture, store, and communicate clinical information. Radiologists routinely refer to prior radiology reports of a patient to recall critical information for new diagnosis, which is quite tedious, time consuming, and prone to human error. Automatic structuring of report content is desired to facilitate such inquiry of information. In this work, we propose an unsupervised machine learning approach to automatically structure radiology reports by detecting and normalizing anatomical phrases based on the Systematized Nomenclature of Medicine-Clinical Terms (SNOMED CT) ontology. The proposed approach combines word embedding-based semantic learning with ontology-based concept mapping to derive the desired concept normalization. The word embedding model was trained using a large corpus of unlabeled radiology reports. Fifty-six anatomical labels were extracted from SNOMED CT as class labels of the whole human anatomy. The proposed framework was compared against a number of state-of-the-art supervised and unsupervised approaches. Radiology reports from three different clinical sites were manually labeled for testing. The proposed approach outperformed other techniques yielding an average precision of 82.6%. The proposed framework boosts the coverage and performance of conventional approaches for concept normalization, by applying word embedding techniques in semantic learning, while avoiding the challenge of having access to a large amount of annotated data, which is typically required for training classifiers.

Keywords: Anatomical classification; Concept normalization; Radiology reports; SNOMED CT; Semantic learning; word2vec.

MeSH terms

Electronic Health Records*
Humans
Radiology / methods*
Terminology as Topic*
Unsupervised Machine Learning*
Workflow