A high throughput semantic concept frequency based approach for patient identification: a case study using type 2 diabetes mellitus clinical notes

Wei-Qi Wei; Cui Tao; Guoqian Jiang; Christopher G Chute

A high throughput semantic concept frequency based approach for patient identification: a case study using type 2 diabetes mellitus clinical notes

AMIA Annu Symp Proc. 2010 Nov 13:2010:857-61.

Authors

Wei-Qi Wei¹, Cui Tao, Guoqian Jiang, Christopher G Chute

Affiliation

¹ Division of Biomedical Statistics and Informatics, Mayo Clinic, Rochester, MN.

PMID: 21347100
PMCID: PMC3041302

Abstract

Current research on high throughput identification of patients with a specific phenotype is in its infancy. There is an urgent need to develop a general automatic approach for patient identification.

Objective: We took advantage of Mayo Clinic electronic clinical notes and proposed a novel method of combining NLP, machine learning, and ontology for automatic patient identification. We also investigated the benefits of involving existing SNOMED semantic knowledge in a patient identification task.

Methods: the SVM algorithm was applied on SNOMED concept units extracted from T2DM case/control clinical notes. Precision, recall, and F-score were calculated to evaluate the performance.

Results: This approach achieved an F-score of above 0.950 for both groups when using all identified concept units as features. Concept units from semantic type-Disease or Syndrome contain the most important information for patient identification. Our results also implied that the coarse level concepts contain enough information to classify T2DM cases/controls.

MeSH terms

Algorithms
Diabetes Mellitus, Type 2
Electronic Health Records
Humans
Natural Language Processing*
Semantics*
Systematized Nomenclature of Medicine

Grants and funding

U01 HG004599/HG/NHGRI NIH HHS/United States