Training pipelines for machine learning (ML) based malware classification often rely on crowdsourced threat feeds, exposing a natural attack injection point. In this paper, we study the susceptibility of feature-based ML malware classifiers to backdoor poisoning attacks, specifically focusing on challenging "clean label" attacks where attackers do not control the sample labeling process. We propose the use of techniques from explainable machine learning to guide the selection of relevant features and values to create effective backdoor triggers in a model-agnostic fashion. Using multiple reference datasets for malware classification, including Windows PE files, PDFs, and Android applications, we demonstrate effective attacks against a diverse set of machine learning models and evaluate the effect of various constraints imposed on the attacker. To demonstrate the feasibility of our backdoor attacks in practice, we create a watermarking utility for Windows PE files that preserves the binary's functionality, and we leverage similar behavior-preserving alteration methodologies for Android and PDF files. Finally, we experiment with potential defensive strategies and show the difficulties of completely defending against these attacks, especially when the attacks blend in with the legitimate sample distribution.
翻译:机器学习(ML)的恶意软件分类培训管道往往依赖多方源威胁反馈,暴露了自然攻击注射点。在本文中,我们研究了基于特性的 ML 恶意软件分类器对后门中毒袭击的易感性,特别侧重于挑战“清洁标签”攻击,攻击者不控制标注过程。我们建议使用来自可解释的机器学习的技术来指导相关特征和价值观的选择,以模型-不可知的方式创造有效的后门触发器。我们利用多种参考数据集来进行恶意软件分类,包括Windows PE文档、PDFs和Android应用程序,我们展示了对各种机器学习模型的有效攻击,并评估了对攻击者施加的各种限制的影响。为了展示我们幕后攻击的实际可行性,我们为Windows PE文档创建了一个水标记工具,以维护二进制功能,我们用类似的行为保护改变方法来保护Android和PDF文件。最后,我们试验了潜在的防御策略,并展示了完全防御这些攻击的困难,特别是在攻击与合法样品分布混合时。