Detecting and correctly classifying malicious executables has become one of the major concerns in cyber security, especially because traditional detection systems have become less effective with the increasing number and danger of threats found nowadays. One way to differentiate benign from malicious executables is to leverage on their hexadecimal representation by creating a set of binary features that completely characterise each executable. In this paper we present a novel supervised learning Bayesian nonparametric approach for binary matrices, that provides an effective probabilistic approach for malware detection. Moreover, and due to the model's flexible assumptions, we are able to use it in a multi-class framework where the interest relies in classifying malware into known families. Finally, a generalisation of the model which provides a deeper understanding of the behaviour across groups for each feature is also developed.
翻译:检测和正确分类恶意可执行软件已成为网络安全的主要关切之一,特别是因为传统检测系统随着当今发现的威胁数量和危险不断增加而变得不那么有效。区分良性可执行软件和恶意可执行软件的一种方法是,通过创建一套完全体现每个可执行软件特点的二进制特征来利用其六进制代表法。在本文中,我们介绍了一种新颖的受监督的巴伊西亚学习非对称方法二进制矩阵,该方法为恶意软件的检测提供了有效的概率方法。此外,由于模型的灵活假设,我们能够在多级框架内使用它,在这个框架里,我们的兴趣在于将恶意软件分类为已知家庭。最后,还开发了模型的概括化,为每个特性的跨组行为提供了更深入的了解。