Comparing Logistic Regression Models with Alternative Machine Learning Methods to Predict the Risk of Drug Intoxication Mortality

Int J Environ Res Public Health. 2020 Jan 31;17(3):897. doi: 10.3390/ijerph17030897.

Abstract

(1) Medical research has shown an increasing interest in machine learning, permitting massive multivariate data analysis. Thus, we developed drug intoxication mortality prediction models, and compared machine learning models and traditional logistic regression. (2) Categorized as drug intoxication, 8,937 samples were extracted from the Korea Centers for Disease Control and Prevention (2008-2017). We trained, validated, and tested each model through data and compared their performance using three measures: Brier score, calibration slope, and calibration-in-the-large. (3) A chi-square test demonstrated that mortality risk statistically significantly differed according to severity, intent, toxic substance, age, and sex. The multilayer perceptron model (MLP) had the highest area under the curve (AUC), and lowest Brier score in training and validation phases, while the logistic regression model (LR) showed the highest AUC (0.827) and lowest Brier score (0.0307) in the testing phase. MLP also had the second-highest AUC (0.816) and second-lowest Brier score (0.003258) in the testing phase, demonstrating better performance than the decision-making tree model. (4) Given the complexity of choosing tuning parameters, LR proved competitive when using medical datasets, which require strict accuracy.

Keywords: drug intoxication; influencing factor; logistic regression; machine learning; mortality prediction.

Publication types

  • Comparative Study
  • Research Support, Non-U.S. Gov't

MeSH terms

  • Alcoholic Intoxication / mortality*
  • Area Under Curve
  • Logistic Models*
  • Machine Learning*
  • Mortality
  • Neural Networks, Computer*
  • Republic of Korea
  • Risk Assessment