Malware detection on binary executables provides a high availability to even binaries which are not disassembled or decompiled. However, a binary-level approach could cause ambiguity problems. In this paper, we propose a new feature engineering technique that use minimal knowledge about the internal layout on a binary. The proposed feature avoids the ambiguity problems by integrating the information about the layout with structural entropy. The experimental results show that our feature improves accuracy and F1-score by 3.3% and 0.07, respectively, on a CNN based malware detector with realistic benign and malicious samples.
翻译:在二进制可执行文件上进行恶意软件检测可以提供对没有反汇编或反编译的二进制文件的高可用性。然而,基于二进制的方法可能会引起模糊问题。本文提出了一种新的特征工程技术,该技术利用有关二进制文件内部布局的最少知识。所提出的特征通过将布局信息与结构熵结合起来,避免了模糊性问题。实验结果表明,我们的特征在具有真实 benign 和 malicious 样本的基于 CNN 的恶意软件检测程序中,将准确性和 F1-score 分别提高了 3.3% 和 0.07。