Initializing a hospital-wide data quality program. The AP-HP experience

Comput Methods Programs Biomed. 2019 Nov:181:104804. doi: 10.1016/j.cmpb.2018.10.016. Epub 2018 Nov 9.

Abstract

Background and objectives: Data Quality (DQ) programs are recognized as a critical aspect of new-generation research platforms using electronic health record (EHR) data for building Learning Healthcare Systems. The AP-HP Clinical Data Repository aggregates EHR data from 37 hospitals to enable large-scale research and secondary data analysis. This paper describes the DQ program currently in place at AP-HP and the lessons learned from two DQ campaigns initiated in 2017.

Materials and methods: As part of the AP-HP DQ program, two domains - patient identification (PI) and healthcare services (HS) - were selected for conducting DQ campaigns consisting of 5 phases: defining the scope, measuring, analyzing, improving and controlling DQ. Semi-automated DQ profiling was conducted in two data sets - the PI data set containing 8.8 M patients and the HS data set containing 13,099 consultation agendas and 2122 care units. Seventeen DQ measures were defined and DQ issues were classified using a unified DQ reporting framework. For each domain, actions plans were defined for improving and monitoring prioritized DQ issues.

Results: Eleven identified DQ issues (8 for the PI data set and 3 for the HS data set) were categorized into completeness (n = 6), conformance (n = 3) and plausibility (n = 2) DQ issues. DQ issues were caused by errors from data originators, ETL issues or limitations of the EHR data entry tool. The action plans included sixteen actions (9 for the PI domain and 7 for the HS domain). Though only partial implementation, the DQ campaigns already resulted in significant improvement of DQ measures.

Conclusion: DQ assessments of hospital information systems are largely unpublished. The preliminary results of two DQ campaigns conducted at AP-HP illustrate the benefit of the engagement into a DQ program. The adoption of a unified DQ reporting framework enables the communication of DQ findings in a well-defined manner with a shared vocabulary. Dedicated tooling is needed to automate and extend the scope of the generic DQ program. Specific DQ checks will be additionally defined on a per-study basis to evaluate whether EHR data fits for specific uses.

Keywords: Data accuracy; Data quality; Data warehousing; Electronic health records; Observational Studies as Topic.

MeSH terms

  • Data Accuracy*
  • Data Warehousing
  • Databases, Factual
  • Decision Support Systems, Clinical
  • Electronic Health Records / standards*
  • France / epidemiology
  • Hospitals / standards*
  • Humans
  • Interdisciplinary Communication
  • Learning Health System
  • Medical Informatics
  • Observational Studies as Topic
  • Quality Assurance, Health Care*