We investigate how to modify executable files to deceive malware classification systems. This work's main contribution is a methodology to inject bytes across a malware file randomly and use it both as an attack to decrease classification accuracy but also as a defensive method, augmenting the data available for training. It respects the operating system file format to make sure the malware will still execute after our injection and will not change its behavior. We reproduced five state-of-the-art malware classification approaches to evaluate our injection scheme: one based on GIST+KNN, three CNN variations and one Gated CNN. We performed our experiments on a public dataset with 9,339 malware samples from 25 different families. Our results show that a mere increase of 7% in the malware size causes an accuracy drop between 25% and 40% for malware family classification. They show that a automatic malware classification system may not be as trustworthy as initially reported in the literature. We also evaluate using modified malwares alongside the original ones to increase networks robustness against mentioned attacks. Results show that a combination of reordering malware sections and injecting random data can improve overall performance of the classification. Code available at https://github.com/adeilsonsilva/malware-injection.
翻译:我们研究如何修改可执行文件,以欺骗恶意软件分类系统。 这项工作的主要贡献是随机地在恶意软件文档中注入字元, 并将其作为攻击来降低分类准确性, 同时也是一种防御性方法, 增加培训可用数据。 我们尊重操作系统文件格式, 以确保恶意软件在我们注射后仍然执行, 不会改变其行为。 我们复制了五种最先进的恶意软件分类方法来评价我们的注射计划: 一种基于 GIST+KNN、 三个CNN 变异和一个 Ged CNN 。 我们用来自 25 个不同家庭的 9 339 个恶意软件样本对一个公共数据集进行了实验。 我们的结果显示, 恶意软件大小仅增加7%, 就会导致错误软件家庭分类的准确性下降25%至40% 。 它们表明, 自动恶意软件分类系统可能不像文献最初所报告的那样可信 。 我们还在原始软件中评估使用修改过的恶意软件来增强网络抵御上述攻击的能力。 结果显示, 重新订购恶意软件部分和输入随机数据可以改善整个分类的功能 。