An open natural language processing (NLP) framework for EHR-based clinical research: a case demonstration using the National COVID Cohort Collaborative (N3C)

Sijia Liu; Andrew Wen; Liwei Wang; Huan He; Sunyang Fu; Robert Miller; Andrew Williams; Daniel Harris; Ramakanth Kavuluru; Mei Liu; Noor Abu-El-Rub; Dalton Schutte; Rui Zhang; Masoud Rouhizadeh; John D Osborne; Yongqun He; Umit Topaloglu; Stephanie S Hong; Joel H Saltz; Thomas Schaffter; Emily Pfaff; Christopher G Chute; Tim Duong; Melissa A Haendel; Rafael Fuentes; Peter Szolovits; Hua Xu; Hongfang Liu

doi:10.1093/jamia/ocad134

An open natural language processing (NLP) framework for EHR-based clinical research: a case demonstration using the National COVID Cohort Collaborative (N3C)

J Am Med Inform Assoc. 2023 Nov 17;30(12):2036-2040. doi: 10.1093/jamia/ocad134.

Authors

Sijia Liu¹, Andrew Wen¹, Liwei Wang¹, Huan He¹, Sunyang Fu¹, Robert Miller², Andrew Williams², Daniel Harris³, Ramakanth Kavuluru³, Mei Liu⁴, Noor Abu-El-Rub⁴, Dalton Schutte⁵, Rui Zhang⁵, Masoud Rouhizadeh⁶, John D Osborne⁷, Yongqun He⁸, Umit Topaloglu⁹, Stephanie S Hong¹⁰, Joel H Saltz¹¹, Thomas Schaffter¹², Emily Pfaff¹³, Christopher G Chute¹⁰, Tim Duong¹⁴, Melissa A Haendel¹⁵, Rafael Fuentes¹⁶, Peter Szolovits¹⁷, Hua Xu¹⁸, Hongfang Liu^{1

18}

Affiliations

¹ Department of Artificial Intelligence and Informatics, Mayo Clinic, Rochester, Minnesota, USA.
² Tufts Clinical and Translational Science Institute, Tufts Medical Center, Boston, Massachusetts, USA.
³ Department of Internal Medicine, University of Kentucky, Lexington, Kentucky, USA.
⁴ Department of Internal Medicine, University of Kansas Medical Center, Kansas City, Kansas, USA.
⁵ Department of Pharmaceutical Care & Health Systems, University of Minnesota at Twin Cities, Minneapolis, Minnesota, USA.
⁶ Department of Pharmaceutical Outcomes & Policy, University of Florida, Gainesville, Florida, USA.
⁷ Department of Computer Science, University of Alabama at Birmingham, Birmingham, Alabama, USA.
⁸ Department of Computational Medicine and Bioinformatics, University of Michigan Medical School, Ann Arbor, Michigan, USA.
⁹ Department of Cancer Biology, Wake Forest School of Medicine, Winston-Salem, North Carolina, USA.
¹⁰ Department of Medicine, Johns Hopkins University, Baltimore, Maryland, USA.
¹¹ Department of Biomedical Informatics, Stony Brook University, Stony Brook, New York, USA.
¹² Sage Bionetwork, Seattle, Washington, USA.
¹³ Department of Medicine, University of North Carolina Chapel Hill, Chapel Hill, North Carolina, USA.
¹⁴ Department of Radiology, Albert Einstein College of Medicine, Bronx, New York, USA.
¹⁵ Center for Health AI, University of Colorado Anschutz Medical Campus, Denver, Colorado, USA.
¹⁶ Alex Informatics, North Bethesda, Maryland, USA.
¹⁷ Department of Electrical Engineering and Computer Science, Massachusetts Institute of Technology, Cambridge, Massachusetts, USA.
¹⁸ School of Biomedical Informatics, University of Texas Health Science Center at Houston, Houston, Texas, USA.

Abstract

Despite recent methodology advancements in clinical natural language processing (NLP), the adoption of clinical NLP models within the translational research community remains hindered by process heterogeneity and human factor variations. Concurrently, these factors also dramatically increase the difficulty in developing NLP models in multi-site settings, which is necessary for algorithm robustness and generalizability. Here, we reported on our experience developing an NLP solution for Coronavirus Disease 2019 (COVID-19) signs and symptom extraction in an open NLP framework from a subset of sites participating in the National COVID Cohort (N3C). We then empirically highlight the benefits of multi-site data for both symbolic and statistical methods, as well as highlight the need for federated annotation and evaluation to resolve several pitfalls encountered in the course of these efforts.

Keywords: electronic healthy records; federated learning; multi-institutional data annotation; natural language processing.

Publication types

Research Support, N.I.H., Extramural
Research Support, Non-U.S. Gov't

MeSH terms

Algorithms
COVID-19*
Electronic Health Records
Humans
Natural Language Processing*

Abstract

Publication types

MeSH terms

Grants and funding