Background: Medical education is moving toward developing guidelines using the evidence-based approach; however, controlled data are missing for answering complex treatment decisions such as those made during suicide attempts. A new set of statistical techniques called data mining (or machine learning) is being used by different industries to explore complex databases and can be used to explore large clinical databases.
Method: The study goal was to reanalyze, using data mining techniques, a published study of which variables predicted psychiatrists' decisions to hospitalize in 509 suicide attempters over the age of 18 years who were assessed in the emergency department. Patients were recruited for the study between 1996 and 1998. Traditional multivariate statistics were compared with data mining techniques to determine variables predicting hospitalization.
Results: Five analyses done by psychiatric researchers using traditional statistical techniques classified 72% to 88% of patients correctly. The model developed by researchers with no psychiatric knowledge and employing data mining techniques used 5 variables (drug consumption during the attempt, relief that the attempt was not effective, lack of family support, being a housewife, and family history of suicide attempts) and classified 99% of patients correctly (99% sensitivity and 100% specificity).
Conclusions: This reanalysis of a published study fundamentally tries to make the point that these new multivariate techniques, called data mining, can be used to study large clinical databases in psychiatry. Data mining techniques may be used to explore important treatment questions and outcomes in large clinical databases and to help develop guidelines for problems where controlled data are difficult to obtain. New opportunities for good clinical research may be developed by using data mining analyses.