Analyzing a huge amount of malware is a major burden for security analysts. Since emerging malware is often a variant of existing malware, automatically classifying malware into known families greatly reduces a part of their burden. Image-based malware classification with deep learning is an attractive approach for its simplicity, versatility, and affinity with the latest technologies. However, the impact of differences in deep learning models and the degree of transfer learning on the classification accuracy of malware variants has not been fully studied. In this paper, we conducted an exhaustive survey of deep learning models using 24 ImageNet pre-trained models and five fine-tuning parameters, totaling 120 combinations, on two platforms. As a result, we found that the highest classification accuracy was obtained by fine-tuning one of the latest deep learning models with a relatively low degree of transfer learning, and we achieved the highest classification accuracy ever in cross-validation on the Malimg and Drebin datasets. We also confirmed that this trend holds true for the recent malware variants using the VirusTotal 2020 Windows and Android datasets. The experimental results suggest that it is effective to periodically explore optimal deep learning models with the latest models and malware datasets by gradually reducing the degree of transfer learning from half.
翻译:分析大量恶意软件是安全分析人员的一大负担。 新出现的恶意软件往往是现有恶意软件的一种变体,因此,将恶意软件自动归类为已知家庭会大大减轻其部分负担。 深层学习的基于图像的恶意软件分类对于其简单、多功能和与最新技术的亲近性具有吸引力。 但是,深层学习模型的差异和对恶意软件变体分类精确度的转移学习程度的不同影响尚未得到充分研究。 在本文件中,我们利用24个图像网络预培训模型和5个微调参数对深层学习模型进行了彻底调查,在两个平台上共将120个组合划入已知家庭。结果,我们发现最高分类准确性是通过微调最新的深层学习模型之一,而转移学习程度相对较低,我们实现了马里姆格和德雷宾数据集交叉校验的最高分类精度。 我们还确认,使用2020年病毒测试窗口和机器人数据集,对最近的恶意软件变异模型进行了彻底调查,结果显示,通过逐步探索最深层半级模型,从而减少最新数据学习模式。