Robust Online Multilabel Learning Under Dynamic Changes in Data Distribution With Labels

IEEE Trans Cybern. 2020 Jan;50(1):374-385. doi: 10.1109/TCYB.2018.2869476. Epub 2019 May 17.

Abstract

In this paper, a robust online multilabel learning method dealing with dynamically changing multilabel data streams is proposed. The proposed method has three advantages: 1) higher accuracy due to a newly defined objective function based on labels ranking; 2) fast training and update based on a newly derived closed-form (rather than gradient descent based) solution for the new objective function; and 3) high robustness to a newly identified concept drift in multilabel data streams, namely, changes in data distribution with labels (CDDL). The high robustness benefits from two novel works: 1) a new sequential update rule that preserves the labels ranking information learned from all old (but discarded) samples while updating the model only based on new incoming samples and 2) a fixed threshold for label bipartition that is insensitive to any kind of changes in data distribution including CDDL. The proposed method has been evaluated over 13 benchmark datasets from various domains. As shown in the experimental results, the proposed work is highly robust to CDDL in both the sequential model update and multilabel thresholding. Furthermore, the proposed method improves the performance in different evaluation measures, including Hamming loss, F1-measure, Precision, and Recall while taking short training time on most evaluated datasets.