An Integrated Pipeline for Phenotypic Characterization, Clustering and Visualization of Patient Cohorts in a Rare Disease-Oriented Clinical Data Warehouse

Stud Health Technol Inform. 2024 Aug 22:316:1785-1789. doi: 10.3233/SHTI240777.

Abstract

Rare diseases pose significant challenges due to their heterogeneity and lack of knowledge. This study develops a comprehensive pipeline interoperable with a document-oriented clinical data warehouse, integrating cohort characterization, patient clustering and interpretation. Leveraging NLP, semantic similarity, machine learning and visualization, the pipeline enables the identification of prevalent phenotype patterns and patient stratification. To enhance interpretability, discriminant phenotypes characterizing each cluster are provided. Users can visually test hypotheses by marking patients exhibiting specific keywords in the EHR like genes, drugs and procedures. Implemented through a web interface, the pipeline enables clinicians to navigate through different modules, discover intricate patterns and generate interpretable insights that may advance rare diseases understanding, guide decision-making, and ultimately improve patient outcomes.

Keywords: Clustering; electronic health record; rare disease; visualization.

MeSH terms

  • Cluster Analysis
  • Data Warehousing
  • Electronic Health Records*
  • Humans
  • Machine Learning
  • Natural Language Processing
  • Phenotype*
  • Rare Diseases*
  • User-Computer Interface