Packing is an obfuscation technique widely used by malware to hide the content and behavior of a program. Much prior research has explored how to detect whether a program is packed. This research includes a broad variety of approaches such as entropy analysis, syntactic signatures and more recently machine learning classifiers using various features. However, no robust results have indicated which algorithms perform best, or which features are most significant. This is complicated by considering how to evaluate the results since accuracy, cost, generalization capabilities, and other measures are all reasonable. This work explores eleven different machine learning approaches using 119 features to understand: which features are most significant for packing detection; which algorithms offer the best performance; and which algorithms are most economical.
翻译:包装是一种被恶意软件广泛用来隐藏程序内容和行为的模糊技术。 许多先前的研究已经探索了如何检测一个程序是否已经打包。 这项研究包括了各种各样的方法, 例如: 原子弹分析、 合成特征和最近使用各种特征的机器学习分类方法。 但是, 没有可靠的结果显示哪种算法最有效, 或哪些特征最为重要 。 考虑如何评估其结果, 因为准确性、 成本、 一般化能力及其他措施都是合理的, 这很复杂 。 这项工作探索了 11 种不同的机器学习方法, 使用119 个特征来理解 : 哪些特征对包装检测最为重要 ; 哪种算法提供最佳性能 ; 哪些算法最经济 。