Adversarial attacks on deep-learning models have been receiving increased attention in recent years. Work in this area has mostly focused on gradient-based techniques, so-called white-box attacks, wherein the attacker has access to the targeted model's internal parameters; such an assumption is usually unrealistic in the real world. Some attacks additionally use the entire pixel space to fool a given model, which is neither practical nor physical (i.e., real-world). On the contrary, we propose herein a gradient-free method that uses the learned image manifold of a pretrained generative adversarial network (GAN) to generate naturalistic physical adversarial patches for object detectors. We show that our proposed method works both digitally and physically.
翻译:在深度学习模型上进行对抗攻击已经引起越来越多的关注。这方面的工作大多集中在基于梯度的技术,即所谓的白盒攻击,其中攻击者可以访问目标模型的内部参数;这通常在现实世界中是不切实际的。一些攻击还使用整个像素空间来欺骗给定的模型,这既不实用也不切实际(即现实世界中)。相反,我们在此提出了一种基于预训练生成对抗网络(GAN)学习的图像流形来生成自然物理对抗补丁的梯度无关方法,用于物体检测器。我们展示了我们提出的方法在数字和物理上都行得通。