Learning Contextual Hierarchical Structure of Medical Concepts with Poincairé Embeddings to Clarify Phenotypes

Brett K Beaulieu-Jones; Isaac S Kohane; Andrew L Beam

Learning Contextual Hierarchical Structure of Medical Concepts with Poincairé Embeddings to Clarify Phenotypes

Pac Symp Biocomput. 2019:24:8-17.

Authors

Brett K Beaulieu-Jones¹, Isaac S Kohane, Andrew L Beam

Affiliation

¹ Department of Biomedical Informatics, Harvard Medical School, Boston, MA 02115, USA dbmi.hms.harvard.edu.

PMID: 30864306
PMCID: PMC6417814

Abstract

Biomedical association studies are increasingly done using clinical concepts, and in particular diagnostic codes from clinical data repositories as phenotypes. Clinical concepts can be represented in a meaningful, vector space using word embedding models. These embeddings allow for comparison between clinical concepts or for straightforward input to machine learning models. Using traditional approaches, good representations require high dimensionality, making downstream tasks such as visualization more difficult. We applied Poincaré embeddings in a 2-dimensional hyperbolic space to a large-scale administrative claims database and show performance comparable to 100-dimensional embeddings in a euclidean space. We then examine disease relationships under different disease contexts to better understand potential phenotypes.

Publication types

Research Support, N.I.H., Extramural

MeSH terms

Computational Biology / methods*
Databases, Factual
Deep Learning*
Humans
International Classification of Diseases
Machine Learning
Medical Informatics
Natural Language Processing
Phenotype
Semantics

Grants and funding

T15 LM007092/LM/NLM NIH HHS/United States