Widely deployed deep neural network (DNN) models have been proven to be vulnerable to adversarial perturbations in many applications (e.g., image, audio and text classifications). To date, there are only a few adversarial perturbations proposed to deviate the DNN models in video recognition systems by simply injecting 2D perturbations into video frames. However, such attacks may overly perturb the videos without learning the spatio-temporal features (across temporal frames), which are commonly extracted by DNN models for video recognition. To our best knowledge, we propose the first black-box attack framework that generates universal 3-dimensional (U3D) perturbations to subvert a variety of video recognition systems. U3D has many advantages, such as (1) as the transfer-based attack, U3D can universally attack multiple DNN models for video recognition without accessing to the target DNN model; (2) the high transferability of U3D makes such universal black-box attack easy-to-launch, which can be further enhanced by integrating queries over the target model when necessary; (3) U3D ensures human-imperceptibility; (4) U3D can bypass the existing state-of-the-art defense schemes; (5) U3D can be efficiently generated with a few pre-learned parameters, and then immediately injected to attack real-time DNN-based video recognition systems. We have conducted extensive experiments to evaluate U3D on multiple DNN models and three large-scale video datasets. The experimental results demonstrate its superiority and practicality.
翻译:广泛部署的深层神经网络模型(DNN)已被证明在许多应用(例如图像、音频和文本分类)中易受对抗性扰动。迄今为止,只有几个对抗性扰动提议在视频识别系统中将DNN模型偏离视频识别系统中的DN模型,只是将 2D 扰动输入视频框。然而,这种攻击可能过度扰动视频,而没有学习由DNN为视频识别而通常由DNN模式提取的spotio-时空特征(跨时空框架)。我们最了解的是,我们提议的第一个黑盒攻击框架将产生普遍的三维(U3D)视频识别系统。 U3D具有许多优点,例如:(1) 以传输为基础的攻击,U3D可以普遍攻击多个DNNNN模型,而不用目标模型;(2) U3D的高可传输性使这种普遍的黑箱袭击系统容易被启动,必要时可以通过对目标模型的查询而进一步加强。 (3) U3D保证大规模的实验结果,并立即对数字的防御性规则进行。(4) 进行现有D级的反向前的D级测试。