Deep neural networks (DNNs) are vulnerable to a class of attacks called "backdoor attacks", which create an association between a backdoor trigger and a target label the attacker is interested in exploiting. A backdoored DNN performs well on clean test images, yet persistently predicts an attacker-defined label for any sample in the presence of the backdoor trigger. Although backdoor attacks have been extensively studied in the image domain, there are very few works that explore such attacks in the video domain, and they tend to conclude that image backdoor attacks are less effective in the video domain. In this work, we revisit the traditional backdoor threat model and incorporate additional video-related aspects to that model. We show that poisoned-label image backdoor attacks could be extended temporally in two ways, statically and dynamically, leading to highly effective attacks in the video domain. In addition, we explore natural video backdoors to highlight the seriousness of this vulnerability in the video domain. And, for the first time, we study multi-modal (audiovisual) backdoor attacks against video action recognition models, where we show that attacking a single modality is enough for achieving a high attack success rate.
翻译:深心神经网络(DNNs)很容易受到被称为“ 后门攻击” 的一类攻击,这种攻击在后门触发器和攻击者感兴趣的目标标签之间产生关联。 后门DNN在干净的测试图像上表现良好, 却持续预测在后门触发器面前任何样本中攻击者定义的标签。 虽然在图像领域广泛研究了后门攻击,但在视频领域探索这类攻击的作品很少, 并且它们往往认为图像后门攻击在视频领域不太有效。 在这项工作中, 我们重新审视传统的后门威胁模型, 并将更多的视频相关方面纳入该模型。 我们显示有毒标签的后门攻击可以以两种方式, 静态和动态地延长时间, 在视频领域导致非常有效的攻击。 此外, 我们探索天然的后门视频以突出视频领域这种脆弱性的严重性。 此外, 我们第一次研究多式( 光观) 后门攻击视频行动识别模型的多式( ) 后门攻击, 我们显示攻击单一方式足以达到高攻击成功率。