The video-based action recognition task has been extensively studied in recent years. In this paper, we study the vulnerability of deep learning-based action recognition methods against the adversarial attack using a new one frame attack that adds an inconspicuous perturbation to only a single frame of a given video clip. We investigate the effectiveness of our one frame attack on state-of-the-art action recognition models, along with thorough analysis of the vulnerability in terms of their model structure and perceivability of the perturbation. Our method shows high fooling rates and produces hardly perceivable perturbation to human observers, which is evaluated by a subjective test. In addition, we present a video-agnostic approach that finds a universal perturbation.
翻译:近年来,对视频行动识别任务进行了广泛研究。在本文中,我们研究了对对抗性攻击的深层次学习型行动识别方法的脆弱性,使用了一种新的框架攻击,这种攻击只给某一视频片段的单一框架增添了不明显的扰动。我们调查了我们对最先进的行动识别模型的单一框架攻击的有效性,同时从模型结构和可察觉性的角度对脆弱性进行了透彻分析。我们的方法显示了高愚弄率,几乎无法为人类观察者带来可察觉的侵扰,而这是通过主观测试来评估的。此外,我们提出了一种发现普遍扰动的视频认知方法。