通过对抗权重调优增强对抗样本的可迁移性 (Enhancing Adversarial Transferability with Adversarial Weight Tuning)

Deep neural networks (DNNs) are vulnerable to adversarial examples (AEs) that mislead the model while appearing benign to human observers. A critical concern is the transferability of AEs, which enables black-box attacks without direct access to the target model. However, many previous attacks have failed to explain the intrinsic mechanism of adversarial transferability. In this paper, we rethink the property of transferable AEs and reformulate the formulation of transferability. Building on insights from this mechanism, we analyze the generalization of AEs across models with different architectures and prove that we can find a local perturbation to mitigate the gap between surrogate and target models. We further establish the inner connections between model smoothness and flat local maxima, both of which contribute to the transferability of AEs. Further, we propose a new adversarial attack algorithm, \textbf{A}dversarial \textbf{W}eight \textbf{T}uning (AWT), which adaptively adjusts the parameters of the surrogate model using generated AEs to optimize the flat local maxima and model smoothness simultaneously, without the need for extra data. AWT is a data-free tuning method that combines gradient-based and model-based attack methods to enhance the transferability of AEs. Extensive experiments on a variety of models with different architectures on ImageNet demonstrate that AWT yields superior performance over other attacks, with an average increase of nearly 5\% and 10\% attack success rates on CNN-based and Transformer-based models, respectively, compared to state-of-the-art attacks. Code available at https://github.com/xaddwell/AWT.

翻译：深度神经网络（DNNs）容易受到对抗样本（AEs）的攻击，这些样本在人类观察者看来无害，却能误导模型。一个关键问题是AEs的可迁移性，它使得无需直接访问目标模型的黑盒攻击成为可能。然而，以往的许多攻击方法未能解释对抗可迁移性的内在机制。本文重新思考了可迁移AEs的特性，并重新表述了可迁移性的定义。基于对该机制的深入理解，我们分析了AEs在不同架构模型间的泛化能力，并证明可以通过寻找局部扰动来缩小代理模型与目标模型之间的差距。我们进一步建立了模型平滑性与平坦局部极大值之间的内在联系，这两者均有助于提升AEs的可迁移性。在此基础上，我们提出了一种新的对抗攻击算法——**A**dversarial **W**eight **T**uning（AWT），该算法利用生成的AEs自适应调整代理模型的参数，以同时优化平坦局部极大值与模型平滑性，且无需额外数据。AWT是一种结合了基于梯度和基于模型的攻击方法的数据无关调优方法，旨在增强AEs的可迁移性。在ImageNet数据集上对不同架构的多种模型进行的大量实验表明，AWT的性能优于其他攻击方法，与最先进的攻击相比，在基于CNN和基于Transformer的模型上，攻击成功率平均分别提升了近5%和10%。代码发布于 https://github.com/xaddwell/AWT。

相关内容

MoDELS

关注 44

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日