DOME: Directional medical embedding vectors from electronic health records

J Biomed Inform. 2025 Jan 2:104768. doi: 10.1016/j.jbi.2024.104768. Online ahead of print.

Abstract

Motivation: The increasing availability of electronic health record (EHR) systems has created enormous potential for translational research. Recent developments in representation learning techniques have led to effective large-scale representations of EHR concepts along with knowledge graphs that empower downstream EHR studies. However, most existing methods require training with patient-level data, limiting their abilities to expand the training with multi-institutional EHR data. On the other hand, scalable approaches that only require summary-level data do not incorporate temporal dependencies between concepts.

Methods: We introduce a DirectiOnal Medical Embedding (DOME) algorithm to encode temporally directional relationships between medical concepts, using summary-level EHR data. Specifically, DOME first aggregates patient-level EHR data into an asymmetric co-occurrence matrix. Then it computes two Positive Pointwise Mutual Information (PPMI) matrices to encode the pairwise prior/posterior dependencies respectively. Following that, a joint matrix factorization is performed on the two PPMI matrices, which results in three vectors for each concept: a semantic embedding and two directional context embeddings. They collectively provide a comprehensive depiction of the temporal relationship between EHR concepts.

Results: We highlight the advantages and translational potential of DOME through three sets of validation studies. First, DOME consistently improves existing direction-agnostic embedding vectors for disease risk prediction in several diseases, for example in lung cancer, by 8.1% in the area under the receiver operating characteristic (AUROC). Second, DOME excels in directional drug-disease relationship inference by successfully differentiating between drug side effects and indications, achieving performance improvements over the state-of-the-art methods by 6.2% and 5.5% in AUROC, correspondingly. Finally, DOME effectively constructs directional knowledge graphs, which distinguish disease risk factors from comorbidities, thereby revealing disease progression trajectories. The source codes are provided at https://github.com/celehs/Directional-EHR-embedding.

Keywords: Directional medical embedding; Drug-disease relationship; Electronic Health Records.