Adapting electronic health records-derived phenotypes to claims data: Lessons learned in using limited clinical data for phenotyping

J Biomed Inform. 2020 Feb:102:103363. doi: 10.1016/j.jbi.2019.103363. Epub 2019 Dec 19.

Abstract

Algorithms for identifying patients of interest from observational data must address missing and inaccurate data and are desired to achieve comparable performance on both administrative claims and electronic health records data. However, administrative claims data do not contain the necessary information to develop accurate algorithms for disorders that require laboratory results, and this omission can result in insensitive diagnostic code-based algorithms. In this paper, we tested our assertion that the performance of a diagnosis code-based algorithm for chronic kidney disorder (CKD) can be improved by adding other codes indirectly related to CKD (e.g., codes for dialysis, kidney transplant, suspicious kidney disorders). Following the best practices from Observational Health Data Sciences and Informatics (OHDSI), we adapted an electronic health record-based gold standard algorithm for CKD and then created algorithms that can be executed on administrative claims data and account for related data quality issues. We externally validated our algorithms on four electronic health record datasets in the OHDSI network. Compared to the algorithm that uses CKD diagnostic codes only, positive predictive value of the algorithms that use additional codes was slightly increased (47.4% vs. 47.9-48.5% respectively). The algorithms adapted from the gold standard algorithm can be used to infer chronic kidney disorder based on administrative claims data. We succeeded in improving the generalizability and consistency of the CKD phenotypes by using data and vocabulary standardized across the OHDSI network, although performance variability across datasets remains. We showed that identifying and addressing coding and data heterogeneity can improve the performance of the algorithms.

Keywords: Chronic kidney disorder; Data quality; Observational Health Data Sciences and Informatics (OHDSI); Phenotyping; Portability; Reproducibility.

Publication types

  • Research Support, N.I.H., Extramural

MeSH terms

  • Algorithms
  • Electronic Health Records*
  • Humans
  • Medical Informatics*
  • Phenotype
  • Predictive Value of Tests