Traumatic experiences have the potential to give rise to post-traumatic stress disorder (PTSD), a debilitating psychiatric condition associated with impairments in both social and occupational functioning. There has been great interest in utilizing machine learning approaches to predict the development of PTSD in trauma patients from clinician assessment or survey-based psychological assessments. However, these assessments require a large number of questions, which is time consuming and not easy to administer. In this paper, we aim to predict PTSD development of patients 3 months post-trauma from multiple survey-based assessments taken within 2 weeks post-trauma. Our objective is to minimize the number of survey questions that patients need to answer while maintaining the prediction accuracy from the full surveys. We formulate this as a feature selection problem and consider 4 different feature selection approaches. We demonstrate that it is possible to achieve up to 72% accuracy for predicting the 3-month PTSD diagnosis from 10 survey questions using a mean decrease in impurity-based feature selector followed by a gradient boosting classifier.
Keywords: Feature selection; Gradient boosting; Mean decrease in impurity; PTSD prognosis; Random forest; Survey optimization.