Adversarial attacks are known to succeed on classifiers, but it has been an open question whether more complex vision systems are vulnerable. In this paper, we study adversarial examples for vision and language models, which incorporate natural language understanding and complex structures such as attention, localization, and modular architectures. In particular, we investigate attacks on a dense captioning model and on two visual question answering (VQA) models. Our evaluation shows that we can generate adversarial examples with a high success rate (i.e., > 90%) for these models. Our work sheds new light on understanding adversarial attacks on vision systems which have a language component and shows that attention, bounding box localization, and compositional internal structures are vulnerable to adversarial attacks. These observations will inform future work towards building effective defenses.
翻译:已知的对立攻击对分类者来说是成功的,但对于更复杂的视觉系统是否脆弱是一个未决问题。在本文中,我们研究了视觉和语言模型的对抗性例子,这些例子包括自然语言理解和复杂的结构,如注意力、本地化和模块架构等。特别是,我们调查了对密集字幕模型和两个直观回答(VQA)模型的攻击。我们的评估表明,我们可以为这些模型生成对抗性例子,其成功率很高(即大于90% )。我们的工作为理解对具有语言组成部分的视觉系统的对抗性攻击提供了新的启示,并表明注意力、捆绑框本地化和构成内部结构容易受到对抗性攻击。这些观察将指导未来建设有效防御的工作。