Multi-class classification for highly imbalanced data is a challenging task in which multiple issues must be resolved simultaneously, including (i) accuracy on classifying highly imbalanced multi-class data; (ii) training efficiency for large data; and (iii) sensitivity to high imbalance ratio (IR). In this paper, a novel sequential ensemble learning (SEL) framework is designed to simultaneously resolve these issues. SEL framework provides a significant property over traditional AdaBoost, in which the majority samples can be divided into multiple small and disjoint subsets for training multiple weak learners without compromising accuracy (while AdaBoost cannot). To ensure the class balance and majority-disjoint property of subsets, a learning strategy called balanced and majority-disjoint subsets division (BMSD) is developed. Unfortunately it is difficult to derive a general learner combination method (LCM) for any kind of weak learner. In this work, LCM is specifically designed for extreme learning machine, called LCM-ELM. The proposed SEL framework with BMSD and LCM-ELM has been compared with state-of-the-art methods over 16 benchmark datasets. In the experiments, under highly imbalanced multi-class data (IR up to 14K; data size up to 493K), (i) the proposed works improve the performance in different measures including G-mean, macro-F, micro-F, MAUC; (ii) training time is significantly reduced.
Keywords: Highly imbalanced data; Multi-class classification; Sequential ensemble learning.
Copyright © 2020 Elsevier Ltd. All rights reserved.