Anomaly detection in virtual machine logs against irrelevant attribute interference

Hao Zhang; Yun Zhou; Huahu Xu; Jiangang Shi; Xinhua Lin; Yiqin Gao

doi:10.1371/journal.pone.0315897

Anomaly detection in virtual machine logs against irrelevant attribute interference

PLoS One. 2025 Jan 7;20(1):e0315897. doi: 10.1371/journal.pone.0315897. eCollection 2025.

Authors

Hao Zhang¹, Yun Zhou², Huahu Xu¹, Jiangang Shi³, Xinhua Lin⁴, Yiqin Gao⁴

Affiliations

¹ School of Computer Engineering and Science, Shanghai University, Shanghai, China.
² Shanghai KingLong IoT Co., Ltd., Shanghai, China.
³ Shanghai Shangda Hairun Information System Co., Ltd., Shanghai, China.
⁴ Shanghai Jiao Tong University, Shanghai, China.

PMID: 39774385
DOI: 10.1371/journal.pone.0315897

Abstract

Virtual machine logs are generated in large quantities. Virtual machine logs may contain some abnormal logs that indicate security risks or system failures of the virtual machine platform. Therefore, using unsupervised anomaly detection methods to identify abnormal logs is a meaningful task. However, collecting accurate anomaly logs in the real world is often challenging, and there is inherent noise in the log information. Parsing logs and anomaly alerts can be time-consuming, making it important to improve their effectiveness and accuracy. To address these challenges, this paper proposes a method called LADSVM(Long Short-Term Memory + Autoencoder-Decoder + SVM). Firstly, the log parsing algorithm is used to parse the logs. Then, the feature extraction algorithm, which combines Long Short-Term Memory and Autoencoder-Decoder, is applied to extract features. Autoencoder-Decoder reduces the dimensionality of the data by mapping the high-dimensional input to a low-dimensional latent space. This helps eliminate redundant information and noise, extract key features, and increase robustness. Finally, the Support Vector Machine is utilized to detect different feature vector signals. Experimental results demonstrate that compared to traditional methods, this approach is capable of learning better features without any prior knowledge, while also exhibiting superior noise robustness and performance. The LADSVM approach excels at detecting anomalies in virtual machine logs characterized by strong sequential patterns and noise. However, its performance may vary when applied to disordered log data. This highlights the necessity of carefully selecting detection methods that align with the specific characteristics of different log data types.

Copyright: © 2025 Zhang et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

MeSH terms

Algorithms*
Humans
Support Vector Machine