Facilitating cancer research using natural language processing of pathology reports

Stud Health Technol Inform. 2004;107(Pt 1):565-72.

Abstract

Many ongoing clinical research projects, such as projects involving studies associated with cancer, involve manual capture of information in surgical pathology reports so that the information can be used to determine the eligibility of recruited patients for the study and to provide other information, such as cancer prognosis. Natural language processing (NLP) systems offer an alternative to automated coding, but pathology reports have certain features that are difficult for NLP systems. This paper describes how a preprocessor was integrated with an existing NLP system (MedLEE) in order to reduce modification to the NLP system and to improve performance. The work was done in conjunction with an ongoing clinical research project that assesses disparities and risks of developing breast cancer for minority women. An evaluation of the system was performed using manually coded data from the research project's database as a gold standard. The evaluation outcome showed that the extended NLP system had a sensitivity of 90.6% and a precision of 91.6%. Results indicated that this system performed satisfactorily for capturing information for the cancer research project.

Publication types

  • Evaluation Study
  • Research Support, N.I.H., Extramural
  • Research Support, U.S. Gov't, P.H.S.

MeSH terms

  • Biomedical Research
  • Breast Neoplasms / ethnology
  • Breast Neoplasms / pathology*
  • Feasibility Studies
  • Female
  • Humans
  • Medical Oncology
  • Medical Records
  • Minority Groups
  • Natural Language Processing*
  • Pathology, Surgical / classification*
  • Risk Factors