Introduction: In this work, we introduce the concept of semantic role labeling to the medical domain. We report first results of porting and adapting an existing resource, Propbank, to the medical field. Propbank is an adjunct to Penn Treebank that provides semantic annotation of predicates and the roles played by their arguments. The main aim of this work is the applicability of the Propbank frame files to predicates typically encountered in the medical literature.
Methods: We analyzed a target corpus of 610,100 abstracts, which was selected by searching for publication type "case reports". From this target corpus, we randomly selected 10,000 sample abstracts to estimate the predicate distribution, and matched the predicates from this sample to the predicates in Propbank.
Results: Of the 1998 unique verbs in our sample, 76% were represented in Propbank. This included the 40 most frequent verbs, which represented 49% of all predicate instances in our sample and which matched the Propbank usage in a study of representative sentences. We propose extensions to Propbank that handle medical predicates, which are not adequately covered by Propbank.
Conclusion: We believe that semantic role labeling using Propbank is a valid approach to capture predicate relations in the medical literature.