Pressing needs of biomedical text mining in biocuration and beyond: opportunities and challenges

Ayush Singhal; Robert Leaman; Natalie Catlett; Thomas Lemberger; Johanna McEntyre; Shawn Polson; Ioannis Xenarios; Cecilia Arighi; Zhiyong Lu

doi:10.1093/database/baw161

Pressing needs of biomedical text mining in biocuration and beyond: opportunities and challenges

Database (Oxford). 2016 Dec 26:2016:baw161. doi: 10.1093/database/baw161. Print 2016.

Authors

Ayush Singhal¹, Robert Leaman¹, Natalie Catlett², Thomas Lemberger³, Johanna McEntyre⁴, Shawn Polson⁵, Ioannis Xenarios⁶, Cecilia Arighi^{7

5}, Zhiyong Lu⁷

Affiliations

¹ National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA.
² Selventa, Cambridge, MA 02140, USA.
³ EMBO, Meyerhofstrasse 1, Heidelberg 69117, Germany.
⁴ European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK.
⁵ Center for Bioinformatics and Computational Biology and Department of Computer and Information Sciences, Delaware Biotechnology Institute, University of Delaware, Newark, DE 19711, USA.
⁶ SIB Swiss Institute of Bioinformatics, Lausanne, Switzerland.
⁷ National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA [email protected].

Abstract

Text mining in the biomedical sciences is rapidly transitioning from small-scale evaluation to large-scale application. In this article, we argue that text-mining technologies have become essential tools in real-world biomedical research. We describe four large scale applications of text mining, as showcased during a recent panel discussion at the BioCreative V Challenge Workshop. We draw on these applications as case studies to characterize common requirements for successfully applying text-mining techniques to practical biocuration needs. We note that system 'accuracy' remains a challenge and identify several additional common difficulties and potential research directions including (i) the 'scalability' issue due to the increasing need of mining information from millions of full-text articles, (ii) the 'interoperability' issue of integrating various text-mining systems into existing curation workflows and (iii) the 'reusability' issue on the difficulty of applying trained systems to text genres that are not seen previously during development. We then describe related efforts within the text-mining community, with a special focus on the BioCreative series of challenge workshops. We believe that focusing on the near-term challenges identified in this work will amplify the opportunities afforded by the continued adoption of text-mining tools. Finally, in order to sustain the curation ecosystem and have text-mining systems adopted for practical benefits, we call for increased collaboration between text-mining researchers and various stakeholders, including researchers, publishers and biocurators.

Published by Oxford University Press 2016. This work is written by US Government employees and is in the public domain in the US.

Pressing needs of biomedical text mining in biocuration and beyond: opportunities and challenges

Authors

Affiliations

Abstract

Publication types

MeSH terms

Grants and funding