Determination of Marital Status of Patients from Structured and Unstructured Electronic Healthcare Data

AMIA Annu Symp Proc. 2020 Mar 4:2019:267-274. eCollection 2019.

Abstract

Social Determinants of Health, including marital status, are becoming increasingly identified as key drivers of health care utilization. This paper describes a robust method to determine the marital status of patients using structured and unstructured electronic healthcare data from a single academic institution in the United States. We developed and validated a natural language processing pipeline (NLP) for the ascertainment of marital status from clinical notes and compared the performance against two baseline methods: a machine learning n-gram model, and structured data obtained from the electronic health record. Overall our NLP engine had excellent performance on both document-level (F1 0.97) and patient-level (F1 0.95) classification. The NLP Engine had superior performance compared with a baseline machine learning n-gram model. We also observed a good correlation between the marital status obtained from our NLP engine and the baseline structured electronic healthcare data (κ 0.6).

Publication types

  • Research Support, U.S. Gov't, Non-P.H.S.
  • Research Support, U.S. Gov't, P.H.S.

MeSH terms

  • Electronic Health Records*
  • Female
  • Hospitals, University
  • Humans
  • Machine Learning*
  • Male
  • Marital Status*
  • Natural Language Processing*
  • Social Determinants of Health
  • Utah