We explore the black-box adversarial attack on video recognition models. Attacks are only performed on selected key regions and key frames to reduce the high computation cost of searching adversarial perturbations on a video due to its high dimensionality. To select key frames, one way is to use heuristic algorithms to evaluate the importance of each frame and choose the essential ones. However, it is time inefficient on sorting and searching. In order to speed up the attack process, we propose a reinforcement learning based frame selection strategy. Specifically, the agent explores the difference between the original class and the target class of videos to make selection decisions. It receives rewards from threat models which indicate the quality of the decisions. Besides, we also use saliency detection to select key regions and only estimate the sign of gradient instead of the gradient itself in zeroth order optimization to further boost the attack process. We can use the trained model directly in the untargeted attack or with little fine-tune in the targeted attack, which saves computation time. A range of empirical results on real datasets demonstrate the effectiveness and efficiency of the proposed method.
翻译:我们探索视频识别模型的黑盒对抗性攻击。 攻击只针对选定的关键区域和关键框架进行, 以减少在视频上搜索对抗性扰动的高计算成本, 因为它具有高度的维度。 选择关键框架时, 一种方法是使用超光速算法来评估每个框架的重要性, 并选择基本框架。 但是, 在排序和搜索方面是低效的。 为了加快攻击过程, 我们提议了一个基于攻击过程的强化学习框架选择战略 。 具体地说, 代理人要探索视频原类和目标类之间的差别, 以便做出选择决定。 它从威胁模型中得到奖励, 显示决定的质量。 此外, 我们还使用突出的检测方法来选择关键区域, 并且只估计梯度的标志, 而不是零顺序优化的梯度本身 。 我们可以直接在非目标攻击中使用经过训练的模型, 或者在目标攻击中几乎没有微调, 从而节省了时间 。 一系列关于真实数据集的经验结果显示拟议方法的效果和效率 。