Evaluating natural language processors in the clinical domain

C Friedman; G Hripcsak

Evaluating natural language processors in the clinical domain

Methods Inf Med. 1998 Nov;37(4-5):334-44.

Authors

C Friedman¹, G Hripcsak

Affiliation

¹ Department of Computer Science, Queens College CUNY, New York, USA. [email protected]

PMID: 9865031

Abstract

Evaluating natural language processing (NLP) systems in the clinical domain is a difficult task which is important for advancement of the field. A number of NLP systems have been reported that extract information from free-text clinical reports, but not many of the systems have been evaluated. Those that were evaluated noted good performance measures but the results were often weakened by ineffective evaluation methods. In this paper we describe a set of criteria aimed at improving the quality of NLP evaluation studies. We present an overview of NLP evaluations in the clinical domain and also discuss the Message Understanding Conferences (MUC) [1-4]. Although these conferences constitute a series of NLP evaluation studies performed outside of the clinical domain, some of the results are relevant within medicine. In addition, we discuss a number of factors which contribute to the complexity that is inherent in the task of evaluating natural language systems.

Publication types

Research Support, Non-U.S. Gov't
Research Support, U.S. Gov't, P.H.S.
Review

MeSH terms

Evaluation Studies as Topic
Humans
Information Storage and Retrieval
Medical Records Systems, Computerized*
Natural Language Processing*

Abstract

Publication types

MeSH terms

Grants and funding