Adversarial patch attacks create adversarial examples by injecting arbitrary distortions within a bounded region of the input to fool deep neural networks (DNNs). These attacks are robust (i.e., physically-realizable) and universally malicious, and hence represent a severe security threat to real-world DNN-based systems. We propose Jujutsu, a two-stage technique to detect and mitigate robust and universal adversarial patch attacks. We first observe that adversarial patches are crafted as localized features that yield large influence on the prediction output, and continue to dominate the prediction on any input. Jujutsu leverages this observation for accurate attack detection with low false positives. Patch attacks corrupt only a localized region of the input, while the majority of the input remains unperturbed. Therefore, Jujutsu leverages generative adversarial networks (GAN) to perform localized attack recovery by synthesizing the semantic contents of the input that are corrupted by the attacks, and reconstructs a ``clean'' input for correct prediction. We evaluate Jujutsu on four diverse datasets spanning 8 different DNN models, and find that it achieves superior performance and significantly outperforms four existing defenses. We further evaluate Jujutsu against physical-world attacks, as well as adaptive attacks.
翻译:Adversarial Adversarial adversarial Patrial compats 攻击在封闭区域中通过向深神经网络(DNNSs)提供输入的任意扭曲来制造对抗性的例子。这些攻击是强健(即实际可实现的)和普遍恶意的,因此对现实世界的DNN系统构成严重的安全威胁。我们提出Jujutssu,这是探测和减轻强力和普遍的对抗性攻击的两阶段技术。我们首先看到,对输入的对抗性攻击是作为局部特征设计的,对预测产出产生很大影响,并继续主导对任何输入的预测。Jujutsu利用这一观测来精确地探测攻击,使用低的物理阳性。补丁攻击只腐蚀一个输入的局部区域,而大部分输入仍不受干扰。因此,Jujutsu利用基因化的对抗性对抗性网络(GAN)来进行局部攻击的恢复。我们首先看到,将攻击所腐蚀的输入的语义内容合成,并重建“纯度”的预测。我们评估了四个不同的数据设置的Jujutssu,用来测量了8个不同的DNSUDNS-destratiming supalstration supturing supstrutes as madestruction as made as madets as made asurvealdatedatedatedatedatedatedates supaldated asus.