In this paper, we consider ensemble classifiers, that is, machine learning based classifiers that utilize a combination of scoring functions. We provide a framework for categorizing such classifiers, and we outline several ensemble techniques, discussing how each fits into our framework. From this general introduction, we then pivot to the topic of ensemble learning within the context of malware analysis. We present a brief survey of some of the ensemble techniques that have been used in malware (and related) research. We conclude with an extensive set of experiments, where we apply ensemble techniques to a large and challenging malware dataset. While many of these ensemble techniques have appeared in the malware literature, previously there has been no way to directly compare results such as these, as different datasets and different measures of success are typically used. Our common framework and empirical results are an effort to bring some sense of order to the chaos that is evident in the evolving field of ensemble learning -- both within the narrow confines of the malware analysis problem, and in the larger realm of machine learning in general.
翻译:在本文中,我们考虑混合分类方法,即使用各种评分功能组合的基于机械学习的分类方法。我们为这类分类方法分类提供了框架,我们概述了几种混合技术,讨论每种技术如何适合我们的框架。从这一一般性导言,我们然后在恶意软件分析的范围内将注意力集中在共同学习的主题上。我们简要调查了在恶意软件(和相关)研究中使用的一些混合技术。我们最后进行了一系列广泛的实验,我们将共同技术应用于一个庞大和具有挑战性的恶意软件数据集。虽然许多这些共同技术出现在恶意软件文献中,但以前没有办法直接比较这些结果,因为通常使用不同的数据集和不同的成功衡量标准。我们的共同框架和经验结果是为了给在不断演变的恶意软件分析问题的狭义范围内和在一般的机器学习的更大范围内表现出来的混乱带来某种秩序感。