Selective ensemble method for anomaly detection based on parallel learning

Sci Rep. 2024 Jan 16;14(1):1420. doi: 10.1038/s41598-024-51849-3.

Abstract

Anomaly detection is a highly important task in the field of data analysis. Traditional anomaly detection approaches often strongly depend on data size, structure and features, while introducing the idea of ensemble into anomaly detection can greatly improve the generalization ability. Ensemble-based anomaly detection methods still face some challenges, however, such as data imbalance, time and space demand and the selection of base detectors. To this end, we propose a selective ensemble method for anomaly detection based on parallel learning (SEAD-PL). First, a differentiated stratified sampling method is designed to alleviate the problem of data imbalance. Then, a distributed parallel training frame is built to address the problem of excessive time and space consumption for base detector training. Finally, a clustering-based ensemble selection strategy is introduced to balance the accuracy and diversity of base detectors. Experiments are performed on six datasets, which demonstrate that the proposed method has obvious advantages over four selected methods.