Research in the field of malware classification often relies on machine learning models that are trained on high-level features, such as opcodes, function calls, and control flow graphs. Extracting such features is costly, since disassembly or code execution is generally required. In this paper, we conduct experiments to train and evaluate machine learning models for malware classification, based on features that can be obtained without disassembly or execution of code. Specifically, we visualize malware samples as images and employ image analysis techniques. In this context, we focus on two machine learning models, namely, Convolutional Neural Networks (CNN) and Extreme Learning Machines (ELM). Surprisingly, we find that ELMs can achieve accuracies on par with CNNs, yet ELM training requires less than~2\%\ of the time needed to train a comparable CNN.
翻译:恶意软件分类领域的研究往往依赖经过高层次特征培训的机器学习模型,如代码、功能调用和控制流程图。提取这些特征成本很高,因为通常需要拆卸或代码执行。在本文中,我们根据不拆卸或执行代码可以取得的特征,对恶意软件分类的机器学习模型进行培训和评价。具体地说,我们将恶意软件样本视为图像,并采用图像分析技术。在这方面,我们侧重于两种机器学习模型,即革命神经网络和极端学习机器(ELM)。令人惊讶的是,我们发现ELMs能够实现与CNN的一致,然而ELM培训需要的时间少于培训可比CNN所需要的时间的~2英寸。