Despite their great success in image recognition tasks, deep neural networks (DNNs) have been observed to be susceptible to universal adversarial perturbations (UAPs) which perturb all input samples with a single perturbation vector. However, UAPs often struggle in transferring across DNN architectures and lead to challenging optimization problems. In this work, we study the transferability of UAPs by analyzing equilibrium in the universal adversarial example game between the classifier and UAP adversary players. We show that under mild assumptions the universal adversarial example game lacks a pure Nash equilibrium, indicating UAPs' suboptimal transferability across DNN classifiers. To address this issue, we propose Universal Adversarial Directions (UADs) which only fix a universal direction for adversarial perturbations and allow the perturbations' magnitude to be chosen freely across samples. We prove that the UAD adversarial example game can possess a Nash equilibrium with a pure UAD strategy, implying the potential transferability of UADs. We also connect the UAD optimization problem to the well-known principal component analysis (PCA) and develop an efficient PCA-based algorithm for optimizing UADs. We evaluate UADs over multiple benchmark image datasets. Our numerical results show the superior transferability of UADs over standard gradient-based UAPs.
翻译:尽管深度神经网络(DNN)在图像识别任务中取得了巨大成功,但已经观察到它们容易受到通用的对抗性扰动(UAPs)的影响,这些扰动通过单个扰动向量影响所有的输入样本。然而,UAPs在跨DNN架构转移方面经常遇到困难,导致挑战性的优化问题。在这项工作中,我们通过分析分类器和UAP对手玩家之间的通用对抗性示例博弈的均衡来研究UAPs的可转移性。我们证明,在温和的假设下,通用对抗性示例博弈缺乏一个纯纳什均衡,说明UAPs在DNN分类器之间具有次优的可转移性。为解决这个问题,我们提出了通用对抗性方向(UADs),它们只针对对抗扰动固定一个通用方向,并允许在样本之间自由选择扰动的大小。我们证明了UAD对抗性示例博弈可以具有一个纯UAD策略的纳什均衡,这意味着UADs的潜在可转移性。我们还将UAD优化问题与著名的主成分分析(PCA)联系起来,并开发了一种高效的基于PCA的算法来优化UADs。我们在多个基准图像数据集上评估了UADs。我们的数值结果显示UADs具有比标准基于梯度的UAPs更好的可转移性。