Identifying gene-specific variations in biomedical text

J Bioinform Comput Biol. 2007 Dec;5(6):1277-96. doi: 10.1142/s0219720007003156.

Abstract

The influence of genetic variations on diseases or cellular processes is the main focus of many investigations, and results of biomedical studies are often only accessible through scientific publications. Automatic extraction of this information requires recognition of the gene names and the accompanying allelic variant information. In a previous work, the OSIRIS system for the detection of allelic variation in text based on a query expansion approach was communicated. Challenges associated with this system are the relatively low recall for variation mentions and gene name recognition. To tackle this challenge, we integrate the ProMiner system developed for the recognition and normalization of gene and protein names with a conditional random field (CRF)-based recognition of variation terms in biomedical text. Following the newly developed normalization of variation entities, we can link textual entities to Single Nucleotide Polymorphism database (dbSNP) entries. The performance of this novel approach is evaluated, and improved results in comparison to state-of-the-art systems are reported.

Publication types

  • Evaluation Study
  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms
  • Alleles
  • Computational Biology
  • Databases, Nucleic Acid
  • Genetic Variation*
  • Humans
  • Information Storage and Retrieval
  • MEDLINE
  • Models, Statistical
  • Polymorphism, Single Nucleotide
  • Terminology as Topic