Natural Language Processing for the Evaluation of Methodological Standards and Best Practices of EHR-based Clinical Research

AMIA Jt Summits Transl Sci Proc. 2020 May 30:2020:171-180. eCollection 2020.

Abstract

The effective use of EHR data for clinical research is challenged by the lack of methodologic standards, transparency, and reproducibility. For example, our empirical analysis on clinical research ontologies and reporting standards found little-to-no informatics-related standards. To address these issues, our study aims to leverage natural language processing techniques to discover the reporting patterns and data abstraction methodologies for EHR-based clinical research. We conducted a case study using a collection of full articles of EHR-based population studies published using the Rochester Epidemiology Project infrastructure. Our investigation discovered an upward trend of reporting EHR-related research methodologies, good practice, and the use of informatics related methods. For example, among 1279 articles, 24.0% reported training for data abstraction, 6% reported the abstractors were blinded, 4.5% tested the inter-observer agreement, 5% reported the use of a screening/data collection protocol, 1.5% reported that team meetings were organized for consensus building, and 0.8% mentioned supervision activities by senior researchers. Despite that, the overall ratio of reporting/adoption of methodologic standards was still low. There was also a high variation regarding clinical research reporting. Thus, continuously developing process frameworks, ontologies, and reporting guidelines for promoting good data practice in EHR-based clinical research are recommended.