A Novel Application of Non-Negative Matrix Factorization to the Prediction of the Health Status of Undocumented Immigrants

Health Equity. 2021 Dec 13;5(1):834-839. doi: 10.1089/heq.2021.0079. eCollection 2021.

Abstract

Introduction: Undocumented immigrants (UIs) in the United States are less likely to be able to afford health insurance. As a result, UIs often lack family doctors and are rarely involved in annual screening programs, which makes estimating their health status remarkably challenging. This is especially true if the laboratory results from limited screening programs fail to provide sufficient clinical information. Methods: To address this issue, we have developed a machine learning model based on the non-negative matrix factorization technique. The data set we used for model training and testing was obtained from the 2004 cost-free hepatitis B screening program at the Omni Health Center located in Plano, Texas. Total 300 people were involved, with 199 identified as UIs. Results: People in the UIs group have higher cholesterol (219.6 mg/dL, p=0.038) and triglycerides (173.2 mg/dL, p=0.03) level. They also have a lower hepatitis B vaccination rate (38%, p=0.0247). No significant difference in hepatitis B(+) was found (p=0.8823). Using 16 individual clinical measurements as training features, our newly developed model has a 67.56% accuracy in predicting the ratio of cholesterol to high-density lipoprotein; in addition, this newly developed model performs 9.1% better than a comparable multiclass logistic regression model. Conclusions: Elderly UIs have poorer health status compared with permanent residents and citizens in the United States. Our newly developed machine learning model demonstrates a powerful support tool for designing health intervention programs that target UIs in the United States.

Keywords: cardiovascular risk; machine learning; undocumented immigrants.