Facilitating pharmacogenetic studies using electronic health records and natural-language processing: a case study of warfarin

Hua Xu; Min Jiang; Matt Oetjens; Erica A Bowton; Andrea H Ramirez; Janina M Jeff; Melissa A Basford; Jill M Pulley; James D Cowan; Xiaoming Wang; Marylyn D Ritchie; Daniel R Masys; Dan M Roden; Dana C Crawford; Joshua C Denny

doi:10.1136/amiajnl-2011-000208

Facilitating pharmacogenetic studies using electronic health records and natural-language processing: a case study of warfarin

J Am Med Inform Assoc. 2011 Jul-Aug;18(4):387-91. doi: 10.1136/amiajnl-2011-000208.

Authors

Hua Xu¹, Min Jiang, Matt Oetjens, Erica A Bowton, Andrea H Ramirez, Janina M Jeff, Melissa A Basford, Jill M Pulley, James D Cowan, Xiaoming Wang, Marylyn D Ritchie, Daniel R Masys, Dan M Roden, Dana C Crawford, Joshua C Denny

Affiliation

¹ Department of Biomedical Informatics, School of Medicine, Vanderbilt University, Nashville, Tennessee 37232, USA. [email protected]

Abstract

Objective: DNA biobanks linked to comprehensive electronic health records systems are potentially powerful resources for pharmacogenetic studies. This study sought to develop natural-language-processing algorithms to extract drug-dose information from clinical text, and to assess the capabilities of such tools to automate the data-extraction process for pharmacogenetic studies.

Materials and methods: A manually validated warfarin pharmacogenetic study identified a cohort of 1125 patients with a stable warfarin dose, in which 776 patients were managed by Coumadin Clinic physicians, and the remaining 349 patients were managed by their providers. The authors developed two algorithms to extract weekly warfarin doses from both data sets: a regular expression-based program for semistructured Coumadin Clinic notes; and an advanced weekly dose calculator based on an existing medication information extraction system (MedEx) for narrative providers' notes. The authors then conducted an association analysis between an automatically extracted stable weekly dose of warfarin and four genetic variants of VKORC1 and CYP2C9 genes. The performance of the weekly dose-extraction program was evaluated by comparing it with a gold standard containing manually curated weekly doses. Precision, recall, F-measure, and overall accuracy were reported. Associations between known variants in VKORC1 and CYP2C9 and warfarin stable weekly dose were performed with linear regression adjusted for age, gender, and body mass index.

Results: The authors' evaluation showed that the MedEx-based system could determine patients' warfarin weekly doses with 99.7% recall, 90.8% precision, and 93.8% accuracy. Using the automatically extracted weekly doses of warfarin, the authors successfully replicated the previous known associations between warfarin stable dose and genetic variants in VKORC1 and CYP2C9.

Publication types

Research Support, N.I.H., Extramural

MeSH terms

Algorithms
Anticoagulants / administration & dosage*
Aryl Hydrocarbon Hydroxylases / genetics
Cytochrome P-450 CYP2C9
Data Mining / methods*
Databases, Nucleic Acid*
Drug Dosage Calculations*
Electronic Health Records*
Genome-Wide Association Study
Humans
Linear Models
Mixed Function Oxygenases / genetics
Natural Language Processing*
Pharmacogenetics
Precision Medicine
United States
Vitamin K Epoxide Reductases
Warfarin / administration & dosage*

Substances

Anticoagulants
Warfarin
Mixed Function Oxygenases
CYP2C9 protein, human
Cytochrome P-450 CYP2C9
Aryl Hydrocarbon Hydroxylases
VKORC1 protein, human
Vitamin K Epoxide Reductases

Abstract

Publication types

MeSH terms

Substances

Grants and funding