Family history (FH) is important for disease risk assessment and prevention. However, incorporating FH information derived from electronic health records (EHRs) for downstream analytics is challenging due to the lack of standardization. We aimed to automatically align FH concepts derived from a clinical corpus to disease category resources popularly used, including Clinical Classification System (CCS), Phecode, Comparative Toxicogenomics Database (CTD), Human phenotype ontology, and Human disease ontology (HDO). Leveraging the Unified Medical Language System (UMLS), we achieved high mapping coverages of FH concepts in those resources, using the parent and broader/alike relations available in the UMLS. Among the five resources, CTD has the best coverage (93%) of FH concepts, HDO has the coarsest granularity of FH disease categories, while CCS showed the finest-grained regarding disease categories. The study suggests that we can mitigate the challenge of various degrees of granularity of NLP-derived FH using those ontology or terminological resources.
©2022 AMIA - All rights reserved.