DAEMON: 使用多阶段地貌采矿法进行数据集 - 不可计量解释的恶意分类 (DAEMON: Dataset-Agnostic Explainable Malware Classification Using Multi-Stage Feature Mining)

Numerous metamorphic and polymorphic malicious variants are generated automatically on a daily basis by mutation engines that transform the code of a malicious program while retaining its functionality, in order to evade signature-based detection. These automatic processes have greatly increased the number of malware variants, deeming their fully-manual analysis impossible. Malware classification is the task of determining to which family a new malicious variant belongs. Variants of the same malware family show similar behavioral patterns. Thus, classifying newly discovered malicious programs and applications helps assess the risks they pose. Moreover, malware classification facilitates determining which of the newly discovered variants should undergo manual analysis by a security expert, in order to determine whether they belong to a new family (e.g., one whose members exploit a zero-day vulnerability) or are simply the result of a concept drift within a known malicious family. This motivated intense research in recent years on devising high-accuracy automatic tools for malware classification. In this work, we present DAEMON - a novel dataset-agnostic malware classifier. A key property of DAEMON is that the type of features it uses and the manner in which they are mined facilitate understanding the distinctive behavior of malware families, making its classification decisions explainable. We've optimized DAEMON using a large-scale dataset of x86 binaries, belonging to a mix of several malware families targeting computers running Windows. We then re-trained it and applied it, without any algorithmic change, feature re-engineering or parameter tuning, to two other large-scale datasets of malicious Android applications consisting of numerous malware families. DAEMON obtained highly accurate classification results on all datasets, establishing that it is also platform-agnostic.

翻译：每天通过变异引擎自动生成大量变形和多变的恶意变异体,这些变异体在改变恶意程序代码的同时保留其功能,以逃避基于签名的检测。这些自动过程大大增加了恶意软件变异体的数量,认为完全人工分析是不可能的。恶意分类是确定哪个家庭属于新的恶意变异体的任务。同一恶意软件家族的变异显示类似的行为模式。因此, 将新发现的恶意程序和应用分类有助于评估它们构成的风险。此外, 恶意软件分类有助于确定哪些新发现的变异体应该接受安全专家的手工分析, 以便确定这些变异体是否属于一个新家庭( 例如,一个成员利用零天脆弱性分析) 或仅仅是一个已知恶意家族的概念漂移的结果。这促使近年来对设计高精度的恶意分类自动工具进行深入研究。在这项工作中,我们介绍DAEMON - 一个新的数据- 高级数据集- 智能软件变异变异变。 DAEMON的关键属性是, 应用该变异的特性类型, 它用于不使用甚易变的特性, 并解释内部变变变变型数据。