Electronic Health Records (EHRs) often lack reliable annotation of patient medical conditions. Phenorm, an automated unsupervised algorithm to identify patient medical conditions from EHR data, has been developed. PheVis extends PheNorm at the visit resolution. PheVis combines diagnosis codes together with medical concepts extracted from medical notes, incorporating past history in a machine learning approach to provide an interpretable parametric predictor of the occurrence probability for a given medical condition at each visit. PheVis is applied to two real-world use-cases using the datawarehouse of the University Hospital of Bordeaux: i) rheumatoid arthritis, a chronic condition; ii) tuberculosis, an acute condition. Cross-validated AUROC were respectively 0.943 [0.940; 0.945] and 0.987 [0.983; 0.990]. Cross-validated AUPRC were respectively 0.754 [0.744; 0.763] and 0.299 [0.198; 0.403]. PheVis performs well for chronic conditions, though absence of exclusion of past medical history by natural language processing tools limits its performance in French for acute conditions. It achieves significantly better performance than state-of-the-art unsupervised methods especially for chronic diseases.
Keywords: Electronic health records; High-throughput phenotyping; Phenotypic big data; Precision medicine.
Copyright © 2021 Elsevier Inc. All rights reserved.