SimText: a text mining framework for interactive analysis and visualization of similarities among biomedical entities

Bioinformatics. 2021 Nov 18;37(22):4285-4287. doi: 10.1093/bioinformatics/btab365.

Abstract

Summary: Literature exploration in PubMed on a large number of biomedical entities (e.g. genes, diseases or experiments) can be time-consuming and challenging, especially when assessing associations between entities. Here, we describe SimText, a user-friendly toolset that provides customizable and systematic workflows for the analysis of similarities among a set of entities based on text. SimText can be used for (i) text collection from PubMed and extraction of words with different text mining approaches, and (ii) interactive analysis and visualization of data using unsupervised learning techniques in an interactive app.

Availability and implementation: We developed SimText as an open-source R software and integrated it into Galaxy (https://usegalaxy.eu), an online data analysis platform with supporting self-learning training material available at https://training.galaxyproject.org. A command-line version of the toolset is available for download from GitHub (https://github.com/dlal-group/simtext) or as Docker image (https://hub.docker.com/r/dlalgroup/simtext/tags.).

Supplementary information: Supplementary data are available at Bioinformatics online.

MeSH terms

  • Data Analysis
  • Data Interpretation, Statistical
  • Data Mining* / methods
  • PubMed
  • Software*