Introduction Homelessness contributes to worsening health and increased health care costs. There is little published research that leverages rich electronic health record (EHR) data to predict future homelessness risk and inform interventions to address it. The authors' objective was to develop a model for predicting future homelessness using individual EHR and geographic data covariates. Methods This retrospective cohort study included 2,543,504 adult members (≥ 18 years old) from Kaiser Permanente Northern California and evaluated which covariates predicted a composite outcome of homelessness status (hospital discharge documentation of a homeless patient, medical diagnosis of homelessness, approved medical financial assistance application for homelessness, and/or "homeless/shelter" in address name). The predictors were measured in 2018-2019 and included prior diagnoses and demographic and geographic data. The outcome was measured in 2020. The cohort was split (70:30) into a derivation and validation set, and logistic regression was used to model the outcome. Results Homelessness prevalence was 0.35% in the overall sample. The final logistic regression model included 26 prior diagnoses, demographic, and geographic-level predictors. The regression model using the validation set had moderate sensitivity (80.4%) and specificity (83.2%) for predicting future cases of homelessness and achieved excellent classification properties (area under the curve of 0.891 [95% confidence interval = 0.884-0.897]). Discussion This prediction model can be used as an initial triage step to enhance screening and referral tools for identifying and addressing homelessness, which can improve health and reduce health care costs. Conclusions EHR data can be used to predict chance of homelessness at a population health level.
Keywords: homelessness risk; insured adults; prediction model.