A hybrid approach to record linkage using a combination of deterministic and probabilistic methodology

J Am Med Inform Assoc. 2020 Apr 1;27(4):505-513. doi: 10.1093/jamia/ocz232.

Abstract

Objective: The disjointed healthcare system and the nonexistence of a universal patient identifier across systems necessitates accurate record linkage (RL). We aim to describe the implementation and evaluation of a hybrid record linkage method in a statewide surveillance system for congenital heart disease.

Materials and methods: Clear-text personally identifiable information on individuals in the Colorado Congenital Heart Disease surveillance system was obtained from 5 electronic health record and medical claims data sources. Two deterministic methods and 1 probabilistic RL method using first name, last name, social security number, date of birth, and house number were initially implemented independently and then sequentially in a hybrid approach to assess RL performance.

Results: 16 480 nonunique individuals with congenital heart disease were ascertained. Deterministic linkage methods, when performed independently, yielded 4505 linked pairs (consisting of 2 records linked together within or across data sources). Probabilistic RL, using 3 initial characters of last name and gender for blocking, yielded 6294 linked pairs when executed independently. Using a hybrid linkage routine resulted in 6451 linkages and an additional 18%-24% correct linked pairs as compared to the independent methods. A hybrid linkage routine resulted in higher recall and F-measure scores compared to probabilistic and deterministic methods performed independently.

Discussion: The hybrid approach resulted in increased linkage accuracy and identified pairs of linked record that would have otherwise been missed when using any independent linkage technique.

Conclusion: When performing RL within and across disparate data sources, the hybrid RL routine outperformed independent deterministic and probabilistic methods.

Keywords: congenital heart disease; data harmonization; hybrid; patient matching; record linkage.

Publication types

  • Research Support, U.S. Gov't, P.H.S.

MeSH terms

  • Adolescent
  • Adult
  • Algorithms
  • Colorado
  • Electronic Health Records*
  • Heart Defects, Congenital*
  • Humans
  • Medical Record Linkage / methods*
  • Population Surveillance*
  • Probability