File-level malware detection using byte streams

Sci Rep. 2023 Jun 1;13(1):8925. doi: 10.1038/s41598-023-36088-2.

Abstract

As more documents appear on the Internet, it becomes important to detect malware within the documents. Malware of non-executables might be more dangerous because people usually open them without worrying about inherent danger. Recently, deep learning models are used to analyze byte streams of the non-executables for malware detection. Although they have shown successful results, they are commonly designed for stream-level detection, but not for file-level detection. In this paper, we propose a new method that aggregates the stream-level results to get file-level results for malware detection. We demonstrate its effectiveness by experimental results with our annotated dataset, and show that it gives performance gain of 3.37-5.89% of F1 scores.