Despite previous efforts to build statistical models for predicting the risk of suicidal behavior using machine-learning analysis, a high-accuracy model can lead to overfitting. Furthermore, internal validation cannot completely address this problem. In this study, we created models for predicting the occurrence of suicide attempts among Koreans at high risk of suicide, and we verified these models in an independent cohort. We performed logistic and penalized regression for suicide attempts within 6 months among suicidal ideators and attempters in The Korean Cohort for the Model Predicting a Suicide and Suicide-related Behavior (K-COMPASS). We then validated the models in a test cohort. Our findings indicated that several factors significantly predicted suicide attempts in the models, including young age, suicidal ideation, previous suicidal attempts, anxiety, alcohol abuse, stress, and impulsivity. The area under the curve and positive predictive values were 0.941 and 0.484 after variable selection and 0.751 and 0.084 in the test cohort. The corresponding values for the penalized regression model were 0.943 and 0.524 in the original training cohort and 0.794 and 0.115 in the test cohort. The prediction model constructed through a prospective cohort study of the suicide high-risk group showed satisfactory accuracy even in the test cohort. The accuracy with penalized regression was greater than that with the "classical" logistic model.
Keywords: Cohort; Penalized regression model; Prediction model; Suicide; Suicide-attempt.
Copyright © 2024 The Authors. Published by Elsevier Ltd.. All rights reserved.