It is well-known that recurrent neural networks (RNNs), although widely used, are vulnerable to adversarial attacks including one-frame attacks and multi-frame attacks. Though a few certified defenses exist to provide guaranteed robustness against one-frame attacks, we prove that defending against multi-frame attacks remains a challenging problem due to their enormous perturbation space. In this paper, we propose the first certified defense against multi-frame attacks for RNNs called RNN-Guard. To address the above challenge, we adopt the perturb-all-frame strategy to construct perturbation spaces consistent with those in multi-frame attacks. However, the perturb-all-frame strategy causes a precision issue in linear relaxations. To address this issue, we introduce a novel abstract domain called InterZono and design tighter relaxations. We prove that InterZono is more precise than Zonotope yet carries the same time complexity. Experimental evaluations across various datasets and model structures show that the certified robust accuracy calculated by RNN-Guard with InterZono is up to 2.18 times higher than that with Zonotope. In addition, we extend RNN-Guard as the first certified training method against multi-frame attacks to directly enhance RNNs' robustness. The results show that the certified robust accuracy of models trained with RNN-Guard against multi-frame attacks is 15.47 to 67.65 percentage points higher than those with other training methods.
翻译:循环神经网络(RNN)被广泛使用,但已经被证明是容易受到单帧攻击和多帧攻击的影响。尽管有一些已经存在的保护方法可以提供针对单帧攻击的保证鲁棒性,但我们证明,由于巨大的扰动空间,抵御多帧攻击仍然是一个具有挑战性的问题。在本文中,我们提出了第一种针对RNN多帧攻击的保证性防御- RNN-Guard。为了应对上述挑战,我们采用扰动-所有帧策略来构建与多帧攻击中相一致的扰动空间。然而,该扰动-所有帧策略会导致线性松弛中的精度问题。为了解决这个问题,我们引入了一种新的抽象域叫做InterZono,并设计了更紧的松弛度。我们证明InterZono比Zonotope更精确,而且耗时相同。在各种数据集和模型结构上进行的实验评估显示,通过InterZono计算的RNN-Guard的保证鲁棒准确性比使用Zonotope的保证鲁棒准确性高达2.18倍。此外,我们将RNN-Guard扩展为针对多帧攻击的第一个保护训练方法,以直接增强RNN的鲁棒性。结果显示,使用RNN-Guard进行训练,针对多帧攻击的可信鲁棒准确性比使用其他训练方法的准确性高出15.47至67.65个百分点。