[Big data, medical language and biomedical terminology systems]

Stefan Schulz; Pablo López-García

doi:10.1007/s00103-015-2190-x

[Big data, medical language and biomedical terminology systems]

Bundesgesundheitsblatt Gesundheitsforschung Gesundheitsschutz. 2015 Aug;58(8):844-852. doi: 10.1007/s00103-015-2190-x.

[Article in German]

Authors

Stefan Schulz¹, Pablo López-García²

Affiliations

¹ Institut für Medizinische Informatik, Statistik und Dokumentation, Medizinische Universität Graz, Auenbruggerplatz 2/V, 8036, Graz, Österreich. [email protected].
² Institut für Medizinische Informatik, Statistik und Dokumentation, Medizinische Universität Graz, Auenbruggerplatz 2/V, 8036, Graz, Österreich.

PMID: 26077872
DOI: 10.1007/s00103-015-2190-x

Abstract

A variety of rich terminology systems, such as thesauri, classifications, nomenclatures and ontologies support information and knowledge processing in health care and biomedical research. Nevertheless, human language, manifested as individually written texts, persists as the primary carrier of information, in the description of disease courses or treatment episodes in electronic medical records, and in the description of biomedical research in scientific publications. In the context of the discussion about big data in biomedicine, we hypothesize that the abstraction of the individuality of natural language utterances into structured and semantically normalized information facilitates the use of statistical data analytics to distil new knowledge out of textual data from biomedical research and clinical routine. Computerized human language technologies are constantly evolving and are increasingly ready to annotate narratives with codes from biomedical terminology. However, this depends heavily on linguistic and terminological resources. The creation and maintenance of such resources is labor-intensive. Nevertheless, it is sensible to assume that big data methods can be used to support this process. Examples include the learning of hierarchical relationships, the grouping of synonymous terms into concepts and the disambiguation of homonyms. Although clear evidence is still lacking, the combination of natural language technologies, semantic resources, and big data analytics is promising.

MeSH terms

Biological Ontologies / organization & administration*
Data Accuracy
Datasets as Topic / classification*
Datasets as Topic / statistics & numerical data*
Germany
Information Storage and Retrieval / standards
Medical Record Linkage / standards
Natural Language Processing*
Terminology as Topic*
Vocabulary, Controlled*