Improving precision in concept normalization

Mayla Boguslav; K Bretonnel Cohen; William A Baumgartner; Lawrence E Hunter

Improving precision in concept normalization

Pac Symp Biocomput. 2018:23:566-577.

Authors

Mayla Boguslav¹, K Bretonnel Cohen, William A Baumgartner, Lawrence E Hunter

Affiliation

¹ Computational Bioscience Program, University of Colorado School of Medicine, Aurora, CO 80045, USA compbio.ucdenver.edu, [email protected].

PMID: 29218915
PMCID: PMC5730334

Abstract

Most natural language processing applications exhibit a trade-off between precision and recall. In some use cases for natural language processing, there are reasons to prefer to tilt that trade-off toward high precision. Relying on the Zipfian distribution of false positive results, we describe a strategy for increasing precision, using a variety of both pre-processing and post-processing methods. They draw on both knowledge-based and frequentist approaches to modeling language. Based on an existing high-performance biomedical concept recognition pipeline and a previously published manually annotated corpus, we apply this hybrid rationalist/empiricist strategy to concept normalization for eight different ontologies. Which approaches did and did not improve precision varied widely between the ontologies.

Publication types

Research Support, N.I.H., Extramural
Research Support, Non-U.S. Gov't
Research Support, U.S. Gov't, Non-P.H.S.

MeSH terms

Biological Ontologies / statistics & numerical data
Computational Biology / methods
Data Mining / methods
Electronic Health Records / statistics & numerical data
False Positive Reactions
Humans
Natural Language Processing*
Precision Medicine / statistics & numerical data
PubMed / statistics & numerical data
Reproducibility of Results

Abstract

Publication types

MeSH terms

Grants and funding