Despite their great success in image recognition tasks, deep neural networks (DNNs) have been observed to be susceptible to universal adversarial perturbations (UAPs) which perturb all input samples with a single perturbation vector. However, UAPs often struggle in transferring across DNN architectures and lead to challenging optimization problems. In this work, we study the transferability of UAPs by analyzing equilibrium in the universal adversarial example game between the classifier and UAP adversary players. We show that under mild assumptions the universal adversarial example game lacks a pure Nash equilibrium, indicating UAPs' suboptimal transferability across DNN classifiers. To address this issue, we propose Universal Adversarial Directions (UADs) which only fix a universal direction for adversarial perturbations and allow the perturbations' magnitude to be chosen freely across samples. We prove that the UAD adversarial example game can possess a Nash equilibrium with a pure UAD strategy, implying the potential transferability of UADs. We also connect the UAD optimization problem to the well-known principal component analysis (PCA) and develop an efficient PCA-based algorithm for optimizing UADs. We evaluate UADs over multiple benchmark image datasets. Our numerical results show the superior transferability of UADs over standard gradient-based UAPs.
翻译:尽管在图像识别任务中取得了巨大成功,但深神经网络(DNNS)被观察到容易受到普遍对抗性扰动干扰(UAPs),这种扰动以单一扰动矢量干扰所有输入样本。然而,UAPs经常在通过DNS结构中挣扎转移,并导致挑战优化问题。在这项工作中,我们通过分析叙级者和UAP敌对对手之间通用对抗性样板游戏的平衡来研究UAPs的可转移性。我们表明,在温和假设下,通用对抗性样板游戏缺乏纯净纳什平衡,这表明UAPs在DNN分类者中具有亚优性可转移性。为了解决这一问题,我们提出了通用反向方向(UADs),该方向只为对抗性扰动性图案设定了通用方向,并允许在样本之间自由选择扰动幅度。我们证明,UADAD的对抗性样板游戏可以拥有纯UADs战略的纳什平衡性,意味着UADs的潜在可转移性。我们还将UADs的优化问题与众所周知的主要组成部分分析(PCADADs)联系起来,并发展我们最高级的UADADs 的压低级标准。