Extensive evidence has demonstrated that deep neural networks (DNNs) are vulnerable to backdoor attacks, which motivates the development of backdoor detection methods. Existing backdoor detection methods are typically tailored for backdoor attacks with individual specific types (e.g., patch-based or perturbation-based). However, adversaries are likely to generate multiple types of backdoor attacks in practice, which challenges the current detection strategies. Based on the fact that adversarial perturbations are highly correlated with trigger patterns, this paper proposes the Adaptive Perturbation Generation (APG) framework to detect multiple types of backdoor attacks by adaptively injecting adversarial perturbations. Since different trigger patterns turn out to show highly diverse behaviors under the same adversarial perturbations, we first design the global-to-local strategy to fit the multiple types of backdoor triggers via adjusting the region and budget of attacks. To further increase the efficiency of perturbation injection, we introduce a gradient-guided mask generation strategy to search for the optimal regions for adversarial attacks. Extensive experiments conducted on multiple datasets (CIFAR-10, GTSRB, Tiny-ImageNet) demonstrate that our method outperforms state-of-the-art baselines by large margins(+12%).
翻译:广泛的证据表明,深神经网络(DNN)很容易受到后门攻击,这促使开发后门探测方法;现有的后门探测方法通常针对个别类型的后门攻击(例如基于补丁或扰动的),但对手实际上可能会产生多种类型的后门攻击,对目前的探测战略构成挑战;鉴于对抗性扰动与触发模式高度相关,本文件建议采用适应性扰动生成(APG)框架,通过适应性注射对抗性对冲扰动来探测多种类型的后门攻击。由于不同的触发模式显示在同一对抗性扰动下存在高度不同的行为,我们首先设计全球到地方的战略,通过调整区域和攻击预算来适应多种类型的后门攻击。为了进一步提高扰动注射的效率,我们采用了一种渐变制面具生成战略,以寻找最佳的对抗性攻击区域。在多个数据集(CIFAR-10、GTS-Net+TINI)上进行了广泛的实验,展示了我们用大规模基模方法演示的大型数据设置(CIFAR-10、GTS-RTS-RB-TIS-TRA-Timmagy basy bir-stal-stal-stationsy-sty-stal-sty-sty-sty-sgy-sty-sgy-smalgy-sgy-sgy-sgy-sgy-sgy-s-sgy-sgy-sgy-sgy-sgy-sgy-sgy-sxxx)。