Background: Electronic health records (EHRs) are a rich source of health information; however social determinants of health, including incarceration, and how they impact health and health care disparities can be hard to extract.
Objective: The main objective of this study was to compare sensitivity and specificity of patient self-report with various methods of identifying incarceration exposure using the EHR.
Research design: Validation study using multiple data sources and types.
Subjects: Participants of the Veterans Aging Cohort Study (VACS), a national observational cohort based on data from the Veterans Health Administration (VHA) EHR that includes all human immunodeficiency virus-infected patients in care (47,805) and uninfected patients (99,060) matched on region, age, race/ethnicity, and sex.
Measures and data sources: Self-reported incarceration history compared with: (1) linked VHA EHR data to administrative data from a state Department of Correction (DOC), (2) linked VHA EHR data to administrative data on incarceration from Centers for Medicare and Medicaid Services (CMS), (3) VHA EHR-specific identifier codes indicative of receipt of VHA incarceration reentry services, and (4) natural language processing (NLP) in unstructured text in VHA EHR.
Results: Linking the EHR to DOC data: sensitivity 2.5%, specificity 100%; linking the EHR to CMS data: sensitivity 7.9%, specificity 99.3%; VHA EHR-specific identifier for receipt of reentry services: sensitivity 7.3%, specificity 98.9%; and NLP, sensitivity 63.5%, specificity 95.9%.
Conclusions: NLP tools hold promise as a feasible and valid method to identify individuals with exposure to incarceration in EHR. Future work should expand this approach using a larger body of documents and refinement of the methods, which may further improve operating characteristics of this method.