Data-driven method to enhance craniofacial and oral phenotype vocabularies

J Am Dent Assoc. 2019 Nov;150(11):933-939.e2. doi: 10.1016/j.adaj.2019.05.029.

Abstract

Background: A significant amount of clinical information captured as free-text narratives could be better used for several applications, such as clinical decision support, ontology development, evidence-based practice, and research. The Human Phenotype Ontology (HPO) is specifically used for semantic comparisons for diagnostic purposes. All these functions require quality coverage of the domain of interest. The authors used natural language processing to capture craniofacial and oral phenotype signatures from electronic health records and then used these signatures for evaluation of existing oral phenotype ontology coverage.

Methods: The authors applied a text-processing pipeline based on the clinical Text Analysis and Knowledge Extraction System to annotate the clinical notes with Unified Medical Language System codes. The authors extracted the disease or disorder phenotype terms, which were then compared with HPO terms and their synonyms.

Results: The authors retrieved 2,153 deidentified clinical notes from 558 patients. Finally, 2,416 unique diseases or disorders phenotype terms were extracted, which included 210 craniofacial or oral phenotype terms. Twenty-six of these phenotypes were not found in the HPO.

Conclusions: The authors demonstrated that natural language processing tools could extract relevant phenotype terms from clinical narratives, which could help identify gaps in existing ontologies and enhance craniofacial and dental phenotyping vocabularies.

Practical implications: The expansion of terms in the dental, oral, and craniofacial domains in the HPO is particularly important as the dental community moves toward electronic health records.

Keywords: Natural language processing; craniofacial and oral phenotypes; evidence-based dentistry; ontology.

Publication types

  • Research Support, N.I.H., Intramural

MeSH terms

  • Electronic Health Records
  • Humans
  • Narration
  • Natural Language Processing*
  • Phenotype
  • Vocabulary*