Background: Measures of cardiac structure and function are important human phenotypes that are associated with a range of clinical outcomes. Studying these traits in large populations can be time consuming and costly. Utilizing data from large electronic medical records (EMRs) is one possible solution to this problem. We describe the extraction and filtering of quantitative transthoracic echocardiographic data from the Epidemiologic Architecture for Genes Linked to Environment (EAGLE) study, a large, racially diverse, EMR-based cohort (n = 15,863).
Results: There were 6,076 echocardiography reports for 2,834 unique adult subjects. Missing data were uncommon with over 90% of data points present. Data irregularities are primarily related to inconsistent use of measurement units and transcriptional errors. The reported filtering method requires manual review of very few data points (<1%), and filtered echocardiographic parameters are similar to published data from epidemiologic populations of similar ethnicity. Moreover, the cohort is comparable in size, and in some cases larger than community-based cohorts of similar race/ethnicity.
Conclusions: These results demonstrate that echocardiographic data can be efficiently extracted from EMRs, and suggest that EMR-based cohorts have the potential to make major contributions toward the study of epidemiologic and genotype-phenotype associations for cardiac structure and function in diverse populations.
Keywords: Echocardiography; Electronic health records; Natural language processing.