Cascade interpolation learning with double subspaces and confidence disturbance for imbalanced problems

Neural Netw. 2019 Oct:118:17-31. doi: 10.1016/j.neunet.2019.06.003. Epub 2019 Jun 8.

Abstract

In this paper, a new ensemble framework named Cascade Interpolation Learning with Double subspaces and Confidence disturbance (CILDC) is designed for the imbalanced classification problems. Developed from the Cascade Forest of the Deep Forest which is the stacking based tree ensembles for big data issues with less hyper-parameters, CILDC aims to generalize the cascade model for more base classifiers. Specifically, CILDC integrates base classifiers through the double subspaces strategy and the random under-sampling preprocessing. Further, one simple but effective confidence disturbance technique is introduced to CILDC to tune the threshold deviation for imbalanced samples. In detail, the disturbance coefficients are multiplied to various confidence vectors before interpolating in each level of CILDC, and the ideal threshold can be adaptively learned through the cascade structure. Furthermore, both the Random Forest and the Naive Bayes are suitable to be the base classifier for CILDC. Subsequently, comprehensive comparison experiments on typical imbalanced datasets demonstrate both the effectiveness and generalization of CILDC.

Keywords: Cascade interpolation; Confidence disturbance; Ensemble learning; Imbalanced problems; Random subspaces.

MeSH terms

  • Algorithms*
  • Bayes Theorem
  • Machine Learning*