通过适应性反反向研究探测普遍后门攻击 (Universal Backdoor Attacks Detection via Adaptive Adversarial Probe)

Extensive evidence has demonstrated that deep neural networks (DNNs) are vulnerable to backdoor attacks, which motivates the development of backdoor attacks detection. Most detection methods are designed to verify whether a model is infected with presumed types of backdoor attacks, yet the adversary is likely to generate diverse backdoor attacks in practice that are unforeseen to defenders, which challenge current detection strategies. In this paper, we focus on this more challenging scenario and propose a universal backdoor attacks detection method named Adaptive Adversarial Probe (A2P). Specifically, we posit that the challenge of universal backdoor attacks detection lies in the fact that different backdoor attacks often exhibit diverse characteristics in trigger patterns (i.e., sizes and transparencies). Therefore, our A2P adopts a global-to-local probing framework, which adversarially probes images with adaptive regions/budgets to fit various backdoor triggers of different sizes/transparencies. Regarding the probing region, we propose the attention-guided region generation strategy that generates region proposals with different sizes/locations based on the attention of the target model, since trigger regions often manifest higher model activation. Considering the attack budget, we introduce the box-to-sparsity scheduling that iteratively increases the perturbation budget from box to sparse constraint, so that we could better activate different latent backdoors with different transparencies. Extensive experiments on multiple datasets (CIFAR-10, GTSRB, Tiny-ImageNet) demonstrate that our method outperforms state-of-the-art baselines by large margins (+12%).

翻译：广泛的证据表明,深层神经网络(DNNS)很容易受到幕后攻击,这促使幕后攻击的探测。大多数探测方法旨在核实模型是否感染了假定的幕后攻击类型,但对手可能在实践中产生各种不为维权者预料到的幕后攻击,这挑战了目前的探测战略。在本文件中,我们集中关注这一更具挑战性的情景,并提议一种叫作适应性反反向攻击Probe(A2P)的普遍幕后攻击探测方法。具体地说,我们假设,普遍后门攻击探测的挑战是,不同的幕后攻击往往在触发模式(即大小和变异性)中表现出不同的特点。因此,我们的A2P采用了一个全球到地方的预测框架,这个框架对适应性区域/预算的图像进行对抗性检查,以适应不同大小/变异性攻击触发不同的幕后攻击触发器。我们提出了关注后区域生成战略,根据目标模型的注意度/位置不同大小/地点(即规模和变异性机率)产生不同区域提案,因为我们的深度预算调整会以更高的汇率方式显示我们预算。

相关内容