Combining population-based administrative health records and electronic medical records for disease surveillance

Saeed Al-Azazi; Alexander Singer; Rasheda Rabbani; Lisa M Lix

doi:10.1186/s12911-019-0845-5

Combining population-based administrative health records and electronic medical records for disease surveillance

BMC Med Inform Decis Mak. 2019 Jul 2;19(1):120. doi: 10.1186/s12911-019-0845-5.

Authors

Saeed Al-Azazi^{1

2}, Alexander Singer³, Rasheda Rabbani^{1

2}, Lisa M Lix^{4

5}

Affiliations

¹ Department of Community Health Sciences, University of Manitoba, S113-750 Bannatyne Avenue, Winnipeg, MB, R3E 0W3, Canada.
² George & Fay Yee Centre for Healthcare Innovation, University of Manitoba, Winnipeg, MB, Canada.
³ Department of Family Medicine, University of Manitoba, Winnipeg, MB, Canada.
⁴ Department of Community Health Sciences, University of Manitoba, S113-750 Bannatyne Avenue, Winnipeg, MB, R3E 0W3, Canada. [email protected].
⁵ George & Fay Yee Centre for Healthcare Innovation, University of Manitoba, Winnipeg, MB, Canada. [email protected].

Abstract

Background: Administrative health records (AHRs) and electronic medical records (EMRs) are two key sources of population-based data for disease surveillance, but misclassification errors in the data can bias disease estimates. Methods that combine information from error-prone data sources can build on the strengths of AHRs and EMRs. We compared bias and error for four data-combining methods and applied them to estimate hypertension prevalence.

Methods: Our study included rule-based OR and AND methods that identify disease cases from either or both data sources, respectively, rule-based sensitivity-specificity adjusted (RSSA) method that corrects for inaccuracies using a deterministic rule, and probabilistic-based sensitivity-specificity adjusted (PSSA) method that corrects for error using a statistical model. Computer simulation was used to estimate relative bias (RB) and mean square error (MSE) under varying conditions of population disease prevalence, correlation amongst data sources, and amount of misclassification error. AHRs and EMRs for Manitoba, Canada were used to estimate hypertension prevalence using validated case definitions and multiple disease markers.

Results: The OR method had the lowest RB and MSE when population disease prevalence was 10%, and the RSSA method had the lowest RB and MSE when population prevalence increased to 20%. As the correlation between data sources increased, the OR method resulted in the lowest RB and MSE. Estimates of hypertension prevalence for AHRs and EMRs alone were 30.9% (95% CI: 30.6-31.2) and 24.9% (95% CI: 24.6-25.2), respectively. The estimates were 21.4% (95% CI: 21.1-21.7), for the AND method, 34.4% (95% CI: 34.1-34.8) for the OR method, 32.2% (95% CI: 31.8-32.6) for the RSSA method, and ranged from 34.3% (95% CI: 34.1-34.5) to 35.9% (95% CI, 35.7-36.1) for the PSSA method, depending on the statistical model.

Conclusions: The OR and AND methods are influenced by correlation amongst the data sources, while the RSSA method is dependent on the accuracy of prior sensitivity and specificity estimates. The PSSA method performed well when population prevalence was high and average correlations amongst disease markers was low. This study will guide researchers to select a data-combining method that best suits their data characteristics.

Keywords: Administrative data; Electronic medical records; Misclassification bias; Prevalence; Statistical model.

Publication types

Research Support, Non-U.S. Gov't

MeSH terms

Adolescent
Adult
Aged
Bias
Canada
Computer Simulation
Electronic Health Records*
Female
Humans
Hypertension / epidemiology*
Information Storage and Retrieval
Male
Middle Aged
Population Surveillance*
Prevalence
Sensitivity and Specificity
Young Adult

Grants and funding

143293/CIHR/Canada