A Comparison between Human and NLP-based Annotation of Clinical Trial Eligibility Criteria Text Using The OMOP Common Data Model

Xinhang Li; Hao Liu; Fabrício Kury; Chi Yuan; Alex Butler; Yingcheng Sun; Anna Ostropolets; Hua Xu; Chunhua Weng

A Comparison between Human and NLP-based Annotation of Clinical Trial Eligibility Criteria Text Using The OMOP Common Data Model

AMIA Jt Summits Transl Sci Proc. 2021 May 17:2021:394-403. eCollection 2021.

Authors

Xinhang Li^{1

2}, Hao Liu^{1

2}, Fabrício Kury¹, Chi Yuan¹, Alex Butler¹, Yingcheng Sun¹, Anna Ostropolets¹, Hua Xu³, Chunhua Weng¹

Affiliations

¹ Department of Biomedical Informatics, Columbia University, New York, NY, USA.
² equal-contribution first authors.
³ School of Biomedical Informatics, The University of Texas Health Science Center at Houston, TX, USA.

PMID: 34457154
PMCID: PMC8378608

Abstract

Human annotations are the established gold standard for evaluating natural language processing (NLP) methods. The goals of this study are to quantify and qualify the disagreement between human and NLP. We developed an NLP system for annotating clinical trial eligibility criteria text and constructed a manually annotated corpus, both following the OMOP Common Data Model (CDM). We analyzed the discrepancies between the human and NLP annotations and their causes (e.g., ambiguities in concept categorization and tacit decisions on inclusion of qualifiers and temporal attributes during concept annotation). This study initially reported complexities in clinical trial eligibility criteria text that complicate NLP and the limitations of the OMOP CDM. The disagreement between and human and NLP annotations may be generalizable. We discuss implications for NLP evaluation.

Publication types

Research Support, N.I.H., Extramural

MeSH terms

Humans
Natural Language Processing*

Grants and funding

R01 LM009886/LM/NLM NIH HHS/United States