Accurate and efficient sequential ensemble learning for highly imbalanced multi-class data

Neural Netw. 2020 Aug:128:268-278. doi: 10.1016/j.neunet.2020.05.010. Epub 2020 May 19.

Abstract

Multi-class classification for highly imbalanced data is a challenging task in which multiple issues must be resolved simultaneously, including (i) accuracy on classifying highly imbalanced multi-class data; (ii) training efficiency for large data; and (iii) sensitivity to high imbalance ratio (IR). In this paper, a novel sequential ensemble learning (SEL) framework is designed to simultaneously resolve these issues. SEL framework provides a significant property over traditional AdaBoost, in which the majority samples can be divided into multiple small and disjoint subsets for training multiple weak learners without compromising accuracy (while AdaBoost cannot). To ensure the class balance and majority-disjoint property of subsets, a learning strategy called balanced and majority-disjoint subsets division (BMSD) is developed. Unfortunately it is difficult to derive a general learner combination method (LCM) for any kind of weak learner. In this work, LCM is specifically designed for extreme learning machine, called LCM-ELM. The proposed SEL framework with BMSD and LCM-ELM has been compared with state-of-the-art methods over 16 benchmark datasets. In the experiments, under highly imbalanced multi-class data (IR up to 14K; data size up to 493K), (i) the proposed works improve the performance in different measures including G-mean, macro-F, micro-F, MAUC; (ii) training time is significantly reduced.

Keywords: Highly imbalanced data; Multi-class classification; Sequential ensemble learning.

MeSH terms

  • Data Collection / standards
  • Machine Learning / standards*