Automatic recognition of self-acknowledged limitations in clinical research literature

Halil Kilicoglu; Graciela Rosemblat; Mario Malicki; Gerben Ter Riet

doi:10.1093/jamia/ocy038

Automatic recognition of self-acknowledged limitations in clinical research literature

J Am Med Inform Assoc. 2018 Jul 1;25(7):855-861. doi: 10.1093/jamia/ocy038.

Authors

Halil Kilicoglu¹, Graciela Rosemblat¹, Mario Malicki^{2

3}, Gerben Ter Riet²

Affiliations

¹ Lister Hill National Center for Biomedical Communications, U.S. National Library of Medicine, Bethesda, MD, USA.
² Department of General Practice, Academic Medical Center, Amsterdam, The Netherlands.
³ Department of Research in Biomedicine and Health, University of Split School of Medicine, Split, Croatia.

Abstract

Objective: To automatically recognize self-acknowledged limitations in clinical research publications to support efforts in improving research transparency.

Methods: To develop our recognition methods, we used a set of 8431 sentences from 1197 PubMed Central articles. A subset of these sentences was manually annotated for training/testing, and inter-annotator agreement was calculated. We cast the recognition problem as a binary classification task, in which we determine whether a given sentence from a publication discusses self-acknowledged limitations or not. We experimented with three methods: a rule-based approach based on document structure, supervised machine learning, and a semi-supervised method that uses self-training to expand the training set in order to improve classification performance. The machine learning algorithms used were logistic regression (LR) and support vector machines (SVM).

Results: Annotators had good agreement in labeling limitation sentences (Krippendorff's α = 0.781). Of the three methods used, the rule-based method yielded the best performance with 91.5% accuracy (95% CI [90.1-92.9]), while self-training with SVM led to a small improvement over fully supervised learning (89.9%, 95% CI [88.4-91.4] vs 89.6%, 95% CI [88.1-91.1]).

Conclusions: The approach presented can be incorporated into the workflows of stakeholders focusing on research transparency to improve reporting of limitations in clinical studies.

Publication types

Research Support, N.I.H., Intramural

MeSH terms

Biomedical Research* / standards
Logistic Models
Machine Learning*
Natural Language Processing
PubMed
Publications / standards*
Support Vector Machine