Real-world adversarial physical patches were shown to be successful in compromising state-of-the-art models in a variety of computer vision applications. Existing defenses that are based on either input gradient or features analysis have been compromised by recent GAN-based attacks that generate naturalistic patches. In this paper, we propose Jedi, a new defense against adversarial patches that is resilient to realistic patch attacks. Jedi tackles the patch localization problem from an information theory perspective; leverages two new ideas: (1) it improves the identification of potential patch regions using entropy analysis: we show that the entropy of adversarial patches is high, even in naturalistic patches; and (2) it improves the localization of adversarial patches, using an autoencoder that is able to complete patch regions from high entropy kernels. Jedi achieves high-precision adversarial patch localization, which we show is critical to successfully repair the images. Since Jedi relies on an input entropy analysis, it is model-agnostic, and can be applied on pre-trained off-the-shelf models without changes to the training or inference of the protected models. Jedi detects on average 90% of adversarial patches across different benchmarks and recovers up to 94% of successful patch attacks (Compared to 75% and 65% for LGS and Jujutsu, respectively).
翻译:真实世界的对抗性物理贴纸已被证明可以在各种计算机视觉应用中成功破坏最先进的模型。现有的基于输入渐变或特征分析的防御已被最近的基于GAN的攻击所攻破,这些攻击能生成自然主义的贴纸。在本文中,我们提出了一种新的防御对抗性贴纸的方法Jedi,它能够抵御逼真的贴纸攻击。Jedi从信息论的角度解决了贴纸定位问题。它利用了两个新思想:(1)通过熵分析改进了潜在贴纸区域的识别:我们展示了对抗性贴纸的熵是高的,在自然主义的贴纸中也是如此;(2)通过自动编码器从高熵内核完成贴纸区域,从而改进了对抗性贴纸的定位。Jedi实现了高精度对抗性贴纸定位,我们展示这对于成功修复图像是至关重要的。由于Jedi依赖于输入熵分析,所以它是模型无关的,可以在预先训练的现成模型上应用,无需更改受保护模型的训练或推理过程。Jedi在不同基准测试中检测到平均90%的对抗性贴纸,并修复了高达94%的成功贴纸攻击(相比之下LGS和Jujutsu分别为75%和65%).