面向模型参数空间的后门隐蔽性研究 (Towards Backdoor Stealthiness in Model Parameter Space)

Recent research on backdoor stealthiness focuses mainly on indistinguishable triggers in input space and inseparable backdoor representations in feature space, aiming to circumvent backdoor defenses that examine these respective spaces. However, existing backdoor attacks are typically designed to resist a specific type of backdoor defense without considering the diverse range of defense mechanisms. Based on this observation, we pose a natural question: Are current backdoor attacks truly a real-world threat when facing diverse practical defenses? To answer this question, we examine 12 common backdoor attacks that focus on input-space or feature-space stealthiness and 17 diverse representative defenses. Surprisingly, we reveal a critical blind spot: Backdoor attacks designed to be stealthy in input and feature spaces can be mitigated by examining backdoored models in parameter space. To investigate the underlying causes behind this common vulnerability, we study the characteristics of backdoor attacks in the parameter space. Notably, we find that input- and feature-space attacks introduce prominent backdoor-related neurons in parameter space, which are not thoroughly considered by current backdoor attacks. Taking comprehensive stealthiness into account, we propose a novel supply-chain attack called Grond. Grond limits the parameter changes by a simple yet effective module, Adversarial Backdoor Injection (ABI), which adaptively increases the parameter-space stealthiness during the backdoor injection. Extensive experiments demonstrate that Grond outperforms all 12 backdoor attacks against state-of-the-art (including adaptive) defenses on CIFAR-10, GTSRB, and a subset of ImageNet. In addition, we show that ABI consistently improves the effectiveness of common backdoor attacks.

翻译：近期关于后门隐蔽性的研究主要集中于输入空间中难以区分的触发器与特征空间中不可分离的后门表征，旨在规避针对这些空间的防御机制。然而，现有后门攻击通常仅针对特定类型的防御手段设计，未充分考虑多样化的防御机制。基于此观察，我们提出一个本质问题：面对多样化的实际防御时，当前后门攻击是否构成真实威胁？为解答该问题，我们系统评估了12种关注输入空间或特征空间隐蔽性的常见后门攻击及17种代表性防御方法。令人惊讶的是，我们发现了一个关键盲区：在输入和特征空间具有隐蔽性的后门攻击，可通过参数空间检测被有效抑制。为探究这一共性漏洞的成因，我们深入分析了后门攻击在参数空间的特征。值得注意的是，我们发现输入空间与特征空间攻击会在参数空间中引入显著的后门相关神经元，而当前攻击方法尚未充分考虑此特性。基于全面隐蔽性考量，我们提出名为Grond的新型供应链攻击方案。该方案通过简洁高效的对抗性后门注入模块限制参数变化，在注入过程中自适应提升参数空间隐蔽性。在CIFAR-10、GTSRB及ImageNet子集上的大量实验表明，Grond在面对先进（包括自适应）防御时，其性能全面优于所有12种后门攻击。此外，我们验证了对抗性后门注入模块能持续提升常见后门攻击的有效性。