A validated natural language processing algorithm for brain imaging phenotypes from radiology reports in UK electronic health records

BMC Med Inform Decis Mak. 2019 Sep 9;19(1):184. doi: 10.1186/s12911-019-0908-7.

Abstract

Background: Manual coding of phenotypes in brain radiology reports is time consuming. We developed a natural language processing (NLP) algorithm to enable automatic identification of brain imaging in radiology reports performed in routine clinical practice in the UK National Health Service (NHS).

Methods: We used anonymized text brain imaging reports from a cohort study of stroke/TIA patients and from a regional hospital to develop and test an NLP algorithm. Two experts marked up text in 1692 reports for 24 cerebrovascular and other neurological phenotypes. We developed and tested a rule-based NLP algorithm first within the cohort study, and further evaluated it in the reports from the regional hospital.

Results: The agreement between expert readers was excellent (Cohen's κ =0.93) in both datasets. In the final test dataset (n = 700) in unseen regional hospital reports, the algorithm had very good performance for a report of any ischaemic stroke [sensitivity 89% (95% CI:81-94); positive predictive value (PPV) 85% (76-90); specificity 100% (95% CI:0.99-1.00)]; any haemorrhagic stroke [sensitivity 96% (95% CI: 80-99), PPV 72% (95% CI:55-84); specificity 100% (95% CI:0.99-1.00)]; brain tumours [sensitivity 96% (CI:87-99); PPV 84% (73-91); specificity: 100% (95% CI:0.99-1.00)] and cerebral small vessel disease and cerebral atrophy (sensitivity, PPV and specificity all > 97%). We obtained few reports of subarachnoid haemorrhage, microbleeds or subdural haematomas. In 110,695 reports from NHS Tayside, atrophy (n = 28,757, 26%), small vessel disease (15,015, 14%) and old, deep ischaemic strokes (10,636, 10%) were the commonest findings.

Conclusions: An NLP algorithm can be developed in UK NHS radiology records to allow identification of cohorts of patients with important brain imaging phenotypes at a scale that would otherwise not be possible.

Keywords: Brain imaging; Natural language processing; Phenotyping; Radiology; Radiology reports; Stroke.

Publication types

  • Research Support, Non-U.S. Gov't
  • Validation Study

MeSH terms

  • Adult
  • Aged
  • Algorithms*
  • Cohort Studies
  • Electronic Health Records*
  • Female
  • Humans
  • Male
  • Middle Aged
  • Natural Language Processing*
  • Neuroimaging*
  • Radiology*
  • State Medicine
  • Stroke / diagnostic imaging
  • United Kingdom
  • Young Adult