Be it for a malicious or legitimate purpose, packing, a transformation that consists in applying various operations like compression or encryption to a binary file, i.e. for making reverse engineering harder or obfuscating code, is widely employed since decades already. Particularly in the field of malware analysis where a stumbling block is evasion, it has proven effective and still gives a hard time to scientists who want to efficiently detect it. While already extensively covered in the scientific literature, it remains an open issue especially when considering its detection time and accuracy trade-off. Many approaches, including machine learning, have been proposed but most studies often restrict their scope (i.e. malware and PE files), rely on uncertain datasets (i.e. built based on a super-detector or using labels from an questionable source) and do no provide any open implementation, which makes comparing state-of-the-art solutions tedious. Considering the many challenges that packing implies, there exists room for improvement in the way it is addressed, especially when dealing with static detection techniques. In order to tackle with these challenges, we propose an experimental toolkit, aptly called the Packing Box, leveraging automation and containerization in an open source platform that brings a unified solution to the research community and we showcase it with some experiments including unbiased ground truth generation, data visualization, machine learning pipeline automation and performance of open source packing static detectors.
翻译:包装是一种转变,包括将压缩或加密等各种操作应用到一个二进制文件,即使反转工程更难或更迷惑的代码,几十年来就被广泛使用。 特别是在恶意软件分析领域,一个绊脚石正在逃避,事实证明它有效,仍然给想要有效检测它的科学家一个困难的时间。科学文献已经广泛覆盖了这个问题,特别是在考虑其探测时间和准确性交易时,它仍然是一个尚未解决的问题。许多方法,包括机器学习,已经提出,但大多数研究往往限制其范围(如恶意软件和PE文件),依靠不确定的数据集(即基于超级检测器或使用可疑来源的标签),并且不提供任何公开的实施,这使得想要有效检测的状态解决方案变得乏味。考虑到包装意味着的许多挑战,在解决方法上仍有改进的余地,特别是在处理静态检测技术时。为了应对这些挑战,我们建议一个实验工具包,恰如实地称作“包装”的管道,或者使用来自可疑来源的标签), 并且不提供任何公开的实施, 使生成的真相的自动化和容器能让我们从一个开放的图像上学习的解决方案。