To identify patients with similar clinical profiles and derive insights from the records and outcomes of similar patients can help fast and precise diagnosis and other clinical decisions for rare diseases. Similarity methods are required to take into account the semantic relations between medical concepts and also the different relevance of all medical concepts presented in patients' medical records. In this paper, we introduce the methods developed in the context of rare disease screening/diagnosis from clinical data warehouse using medical concept embedding and adjusted aggregations. Our methods provided better preliminary results than baseline methods, with a significant improvement of precision among the top ranked similar patients, which is encouraging for further fine-tuning and application on a large-scale dataset for new/candidate patient identification.
Keywords: Electronic Health Records; Patient similarity; rare disease diagnosis; word embedding.