Backdoor attack intends to inject hidden backdoor into the deep neural networks (DNNs), such that the prediction of infected models will be maliciously changed if the hidden backdoor is activated by the attacker-defined trigger. Currently, most existing backdoor attacks adopted the setting of static trigger, $i.e.,$ triggers across the training and testing images follow the same appearance and are located in the same area. In this paper, we revisit this attack paradigm by analyzing trigger characteristics. We demonstrate that this attack paradigm is vulnerable when the trigger in testing images is not consistent with the one used for training. As such, those attacks are far less effective in the physical world, where the location and appearance of the trigger in the digitized image may be different from that of the one used for training. Moreover, we also discuss how to alleviate such vulnerability. We hope that this work could inspire more explorations on backdoor properties, to help the design of more advanced backdoor attack and defense methods.
翻译:后门攻击意图将隐藏的后门攻击注入深层神经网络(DNNs),这样,如果隐藏的后门被攻击者定义的触发器触发,那么对受感染模型的预测就会恶意地改变。目前,大多数现有的后门攻击采用静态触发器设置,即:美元,在训练和测试图象中触发的触发器是相同的外观,并且位于同一区域。在本文中,我们通过分析触发特性来重新审视这种攻击模式。我们证明,当测试图像中的触发器与用于训练的图像不一致时,这种攻击模式是脆弱的。因此,这些攻击在物理界的效果要小得多,数字化图象中的触发器的位置和外观可能不同于用于训练的触发器。此外,我们还讨论如何减轻这种脆弱性。我们希望这项工作能够激发对后门特性的更多探索,以帮助设计更先进的后门攻击和防御方法。