Latent topic discovery of clinical concepts from hospital discharge summaries of a heterogeneous patient cohort

Annu Int Conf IEEE Eng Med Biol Soc. 2014:2014:1773-6. doi: 10.1109/EMBC.2014.6943952.

Abstract

Patients in critical care often exhibit complex disease patterns. A fundamental challenge in clinical research is to identify clinical features that may be characteristic of adverse patient outcomes. In this work, we propose a data-driven approach for phenotype discovery of patients in critical care. We used Hierarchical Dirichlet Process (HDP) as a non-parametric topic modeling technique to automatically discover the latent "topic" structure of diseases, symptoms, and findings documented in hospital discharge summaries. We show that the latent topic structure can be used to reveal phenotypic patterns of diseases and symptoms shared across subgroups of a patient cohort, and may contain prognostic value in stratifying patients' post hospital discharge mortality risks. Using discharge summaries of a large patient cohort from the MIMIC II database, we evaluate the clinical utility of the discovered topic structure in identifying patients who are at high risk of mortality within one year post hospital discharge. We demonstrate that the learned topic structure has statistically significant associations with mortality post hospital discharge, and may provide valuable insights in defining new feature sets for predicting patient outcomes.

Publication types

  • Research Support, N.I.H., Extramural

MeSH terms

  • Aged
  • Aged, 80 and over
  • Critical Care
  • Databases, Factual
  • Humans
  • Middle Aged
  • Mortality
  • Patient Discharge*
  • Pattern Recognition, Automated
  • Prognosis