While the untargeted black-box transferability of adversarial perturbations has been extensively studied before, changing an unseen model's decisions to a specific `targeted' class remains a challenging feat. In this paper, we propose a new generative approach for highly transferable targeted perturbations (\ours). We note that the existing methods are less suitable for this task due to their reliance on class-boundary information that changes from one model to another, thus reducing transferability. In contrast, our approach matches the perturbed image `distribution' with that of the target class, leading to high targeted transferability rates. To this end, we propose a new objective function that not only aligns the global distributions of source and target images, but also matches the local neighbourhood structure between the two domains. Based on the proposed objective, we train a generator function that can adaptively synthesize perturbations specific to a given input. Our generative approach is independent of the source or target domain labels, while consistently performs well against state-of-the-art methods on a wide range of attack settings. As an example, we achieve $32.63\%$ target transferability from (an adversarially weak) VGG19$_{BN}$ to (a strong) WideResNet on ImageNet val. set, which is 4$\times$ higher than the previous best generative attack and 16$\times$ better than instance-specific iterative attack. Code is available at: {\small\url{https://github.com/Muzammal-Naseer/TTP}}.
翻译:虽然以前曾广泛研究过对抗性扰动的非目标黑箱可转移性,但将隐蔽的模型决定改变为特定的“目标”类别仍是一项挑战性的工作。在本文件中,我们提议对高度可转移的定向扰动(\ours)采取新的基因化方法。我们注意到,现有方法不太适合这项任务,因为它们依赖从一种模式改变为另一种模式的等级型扰动信息,从而减少了可转移性。相比之下,我们的方法将受扰动的图像“分布”与目标类别相匹配,导致目标可转移率高。为此,我们提议一个新的目标功能,不仅对源和目标图像的全球分布进行匹配,而且与两个区域之间的本地邻居结构相匹配。根据拟议目标,我们培训一个能够适应性地合成特定输入的扰动性信息的生成功能。我们的基因化方法独立于源值或目标域标签,同时在广泛的攻击环境中持续地与州/艺术方法相匹配,导致目标性可转移率高。举例来说,我们不仅达到源值和目标图像的全球分布,而且符合两个区域之间的本地环境结构结构结构结构结构结构。我们实现了32.63\\\\\\\\\\\\\\\\\\QRBAR设置上最弱的可转让性。