The goal of this study was to build a machine learning model for early prostate cancer prediction based on healthcare utilization patterns. We examined the frequency and pattern changes of healthcare utilization in 2916 prostate cancer patients 3 years prior to their prostate cancer diagnoses and explored several supervised machine learning techniques to predict possible prostate cancer diagnosis. Analysis of patients' medical activities between 1 year and 2 years prior to their prostate cancer diagnoses using XGBoost model provided the best prediction accuracy with high F1 score (0.9) and AUC score (0.73). These pilot results indicated that application of machine learning to healthcare utilization patterns may result in early identification of prostate cancer diagnosis.
Keywords: Big Data Analytics; Machine Learning; Prostate Cancer.