Prediction of 5-Year Survival with Data Mining Algorithms

Stud Health Technol Inform. 2015:213:75-8.

Abstract

Survival time prediction at the time of diagnosis is of great importance to make decisions about treatment and long-term follow-up care. However, predicting the outcome of cancer on the basis of clinical information is a challenging task. We now examined the ability of ten different data mining algorithms (Perceptron, Rule Induction, Support Vector Machine, Linear Regression, Naïve Bayes, Decision Tree, k-nearest Neighbor, Logistic Regression, Neural Network, Random Forest) to predict the dichotomous attribute "5-year-survival" based on seven attributes (sex, UICC-stage, etc.) which are available at the time of diagnosis. For this study we made use of the nationwide German research data set on colon cancer provided by the Robert Koch Institute. To assess the results a comparison between data mining algorithms and physicians' opinions was performed. Therefore, physicians guessed the survival time by leveraging the same seven attributes. The average accuracy of the physicians' opinion was 59%, the average accuracy of the machine learning algorithms was 67.7%.

MeSH terms

  • Age Factors
  • Algorithms*
  • Bayes Theorem
  • Colonic Neoplasms / mortality*
  • Colonic Neoplasms / pathology
  • Data Mining / methods*
  • Decision Trees
  • Humans
  • Linear Models
  • Machine Learning
  • Neoplasm Grading
  • Neoplasm Staging
  • Reproducibility of Results
  • Sex Factors
  • Survival Analysis