Superiority of Classification Tree versus Cluster, Fuzzy and Discriminant Models in a Heartbeat Classification System

PLoS One. 2015 Oct 13;10(10):e0140123. doi: 10.1371/journal.pone.0140123. eCollection 2015.

Abstract

This study presents a 2-stage heartbeat classifier of supraventricular (SVB) and ventricular (VB) beats. Stage 1 makes computationally-efficient classification of SVB-beats, using simple correlation threshold criterion for finding close match with a predominant normal (reference) beat template. The non-matched beats are next subjected to measurement of 20 basic features, tracking the beat and reference template morphology and RR-variability for subsequent refined classification in SVB or VB-class by Stage 2. Four linear classifiers are compared: cluster, fuzzy, linear discriminant analysis (LDA) and classification tree (CT), all subjected to iterative training for selection of the optimal feature space among extended 210-sized set, embodying interactive second-order effects between 20 independent features. The optimization process minimizes at equal weight the false positives in SVB-class and false negatives in VB-class. The training with European ST-T, AHA, MIT-BIH Supraventricular Arrhythmia databases found the best performance settings of all classification models: Cluster (30 features), Fuzzy (72 features), LDA (142 coefficients), CT (221 decision nodes) with top-3 best scored features: normalized current RR-interval, higher/lower frequency content ratio, beat-to-template correlation. Unbiased test-validation with MIT-BIH Arrhythmia database rates the classifiers in descending order of their specificity for SVB-class: CT (99.9%), LDA (99.6%), Cluster (99.5%), Fuzzy (99.4%); sensitivity for ventricular ectopic beats as part from VB-class (commonly reported in published beat-classification studies): CT (96.7%), Fuzzy (94.4%), LDA (94.2%), Cluster (92.4%); positive predictivity: CT (99.2%), Cluster (93.6%), LDA (93.0%), Fuzzy (92.4%). CT has superior accuracy by 0.3-6.8% points, with the advantage for easy model complexity configuration by pruning the tree consisted of easy interpretable 'if-then' rules.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Cluster Analysis
  • Databases as Topic
  • Discriminant Analysis*
  • Electrocardiography
  • Fuzzy Logic*
  • Heart Rate / physiology*
  • Humans
  • Models, Cardiovascular*
  • Reproducibility of Results
  • Sample Size

Grants and funding

SCHILLER AG, Baar, Switzerland partially founded the study with employment payment. The funder did not play a role in the study design, nor in the data collection (all databases used are open source and have not been modified). The funder helped the analysis of the resulting data and made decision to publish, as well as helped writing and counterchecked the manuscript.