Comparison of two data mining techniques in labeling diagnosis to Iranian pharmacy claim dataset: artificial neural network (ANN) versus decision tree model

Arch Iran Med. 2014 Dec;17(12):837-43.

Abstract

Background: This study aimed to evaluate and compare the prediction accuracy of two data mining techniques, including decision tree and neural network models in labeling diagnosis to gastrointestinal prescriptions in Iran.

Methods: This study was conducted in three phases: data preparation, training phase, and testing phase. A sample from a database consisting of 23 million pharmacy insurance claim records, from 2004 to 2011 was used, in which a total of 330 prescriptions were assessed and used to train and test the models simultaneously. In the training phase, the selected prescriptions were assessed by both a physician and a pharmacist separately and assigned a diagnosis. To test the performance of each model, a k-fold stratified cross validation was conducted in addition to measuring their sensitivity and specificity.

Result: Generally, two methods had very similar accuracies. Considering the weighted average of true positive rate (sensitivity) and true negative rate (specificity), the decision tree had slightly higher accuracy in its ability for correct classification (83.3% and 96% versus 80.3% and 95.1%, respectively). However, when the weighted average of ROC area (AUC between each class and all other classes) was measured, the ANN displayed higher accuracies in predicting the diagnosis (93.8% compared with 90.6%).

Conclusion: According to the result of this study, artificial neural network and decision tree model represent similar accuracy in labeling diagnosis to GI prescription.

Publication types

  • Comparative Study
  • Evaluation Study
  • Research Support, Non-U.S. Gov't

MeSH terms

  • Data Mining / methods*
  • Databases, Factual*
  • Decision Trees*
  • Epidemiologic Research Design
  • Gastrointestinal Diseases / diagnosis
  • Gastrointestinal Diseases / drug therapy
  • Gastrointestinal Diseases / epidemiology*
  • Humans
  • Insurance, Pharmaceutical Services*
  • Iran / epidemiology
  • Models, Statistical
  • Neural Networks, Computer*