The wide adoption of Electronic Health Records (EHR) in hospitals provides unique opportunities for high throughput phenotyping of patients. The phenotype extraction from narrative reports can be performed by using either dictionary-based or data-driven methods. We developed a hybrid pipeline using deep learning to enrich the UMLS Metathesaurus for automatic detection of phenotypes from EHRs. The pipeline was evaluated on a French database of patients with a rare disease characterized by skeletal abnormalities, Jeune syndrome. The results showed a 2.5-fold improvement regarding the number of detected skeletal abnormalities compared to the baseline extraction using the standard release of UMLS. Our method can help enrich the coverage of the UMLS and improve phenotyping, especially for languages other than English.
Keywords: Named entity recognition; deep phenotyping; electronic health records; rare disease.