Social Determinants of Health, including marital status, are becoming increasingly identified as key drivers of health care utilization. This paper describes a robust method to determine the marital status of patients using structured and unstructured electronic healthcare data from a single academic institution in the United States. We developed and validated a natural language processing pipeline (NLP) for the ascertainment of marital status from clinical notes and compared the performance against two baseline methods: a machine learning n-gram model, and structured data obtained from the electronic health record. Overall our NLP engine had excellent performance on both document-level (F1 0.97) and patient-level (F1 0.95) classification. The NLP Engine had superior performance compared with a baseline machine learning n-gram model. We also observed a good correlation between the marital status obtained from our NLP engine and the baseline structured electronic healthcare data (κ 0.6).
©2019 AMIA - All rights reserved.