我们永远可以抓住你: 检测有或没有签名的对称补丁物体 (We Can Always Catch You: Detecting Adversarial Patched Objects WITH or WITHOUT Signature)

Recently, the object detection based on deep learning has proven to be vulnerable to adversarial patch attacks. The attackers holding a specially crafted patch can hide themselves from the state-of-the-art person detectors, e.g., YOLO, even in the physical world. This kind of attack can bring serious security threats, such as escaping from surveillance cameras. In this paper, we deeply explore the detection problems about the adversarial patch attacks to the object detection. First, we identify a leverageable signature of existing adversarial patches from the point of the visualization explanation. A fast signature-based defense method is proposed and demonstrated to be effective. Second, we design an improved patch generation algorithm to reveal the risk that the signature-based way may be bypassed by the techniques emerging in the future. The newly generated adversarial patches can successfully evade the proposed signature-based defense. Finally, we present a novel signature-independent detection method based on the internal content semantics consistency rather than any attack-specific prior knowledge. The fundamental intuition is that the adversarial object can appear locally but disappear globally in an input image. The experiments demonstrate that the signature-independent method can effectively detect the existing and improved attacks. It has also proven to be a general method by detecting unforeseen and even other types of attacks without any attack-specific prior knowledge. The two proposed detection methods can be adopted in different scenarios, and we believe that combining them can offer a comprehensive protection.

翻译：最近,基于深层学习的物体探测证明很容易受到对抗性补丁攻击。持有专门设计的补丁的进攻者可以躲藏在最先进的人探测器上, 例如YOLO, 甚至在物理世界中。这种攻击可以带来严重的安全威胁, 例如从监视摄像机上逃脱。在本文中, 我们深入探索对立性补丁攻击的探测问题, 到目标探测。首先, 我们从可视化解释点确定现有对立性补丁的可调用签名签名。一种基于签名的快速防御方法被提出并证明是有效的。其次, 我们设计一个改进的补丁生成算法, 以揭示签字为基础的方法可能被未来出现的技术所绕过的风险。新产生的对抗性补丁可以成功地逃避拟议的基于签名的防御。最后, 我们根据内部内容一致性而不是任何针对攻击的具体知识, 提出一种新的基于签名的检测方法。基本直觉是, 对抗性物体可以在当地出现, 但在全球输入图像中消失。实验表明, 即使是基于签名的生成方法, 也能够有效地检测到一种不预测到的、任何先发式的合并式袭击的方法。