Acquiring Plausible Predications from MEDLINE by Clustering MeSH Annotations

Stud Health Technol Inform. 2015:216:716-20.

Abstract

The massive accumulation of biomedical knowledge is reflected by the growth of the literature database MEDLINE with over 23 million bibliographic records. All records are manually indexed by MeSH descriptors, many of them refined by MeSH subheadings. We use subheading information to cluster types of MeSH descriptor co-occurrences in MEDLINE by processing co-occurrence information provided by the UMLS. The goal is to infer plausible predicates to each resulting cluster. In an initial experiment this was done by grouping disease-pharmacologic substance co-occurrences into six clusters. Then, a domain expert manually performed the assignment of meaningful predicates to the clusters. The mean accuracy of the best ten generated biomedical facts of each cluster was 85%. This result supports the evidence of the potential of MeSH subheadings for extracting plausible medical predications from MEDLINE.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Cluster Analysis
  • Data Mining / methods
  • Knowledge Bases*
  • MEDLINE / statistics & numerical data*
  • Machine Learning
  • Medical Subject Headings*
  • Natural Language Processing*
  • Periodicals as Topic / statistics & numerical data*
  • Terminology as Topic