Deep neural networks for video classification, just like image classification networks, may be subjected to adversarial manipulation. The main difference between image classifiers and video classifiers is that the latter usually use temporal information contained within the video. In this work we present a manipulation scheme for fooling video classifiers by introducing a flickering temporal perturbation that is practically unnoticeable by human observers and is implementable in the real world. After demonstrating the manipulation of action classification of single videos, we generalize the procedure to make universal adversarial perturbation, achieving high fooling ratio. In addition, we generalize the universal perturbation and produce a temporal-invariant perturbation, which can be applied to the video without synchronizing the perturbation to the input. The attack was implemented on several target models and the transferability of the attack was demonstrated. These properties allow us to bridge the gap between simulated environment and real-world application, as will be demonstrated in this paper for the first time for an over-the-air flickering attack.
翻译:视频分类的深神经网络,就像图像分类网络一样,可能受到对抗性操纵。图像分类和视频分类的主要区别在于后者通常使用视频中包含的时间信息。在这项工作中,我们提出了一个对视频分类的操纵方案,通过引入一种几乎无法被人类观察者察觉到并且可在现实世界中实施的闪烁时间扰动来愚弄视频分类者。在演示了对单个视频的行动分类的操纵之后,我们推广了该程序,以便实现通用的对抗性扰动,实现高愚弄率。此外,我们普及了通用的图像分类和视频分类,并产生了一种时间性波动性扰动,可以在不同步扰动输入的情况下应用到视频中。袭击是在几个目标模型上实施的,并演示了袭击的可转移性。这些属性使我们能够弥合模拟环境与真实世界应用之间的差距,这将首次在本文中显示,以便进行超空闪动攻击。