深入探究：平坦性在对抗样本可转移性提前停止中的救援 (Going Further: Flatness at the Rescue of Early Stopping for Adversarial Example Transferability)

Transferability is the property of adversarial examples to be misclassified by other models than the surrogate model for which they were crafted. Previous research has shown that transferability is substantially increased when the training of the surrogate model has been early stopped. A common hypothesis to explain this is that the later training epochs are when models learn the non-robust features that adversarial attacks exploit. Hence, an early stopped model is more robust (hence, a better surrogate) than fully trained models. We demonstrate that the reasons why early stopping improves transferability lie in the side effects it has on the learning dynamics of the model. We first show that early stopping benefits transferability even on models learning from data with non-robust features. We then establish links between transferability and the exploration of the loss landscape in the parameter space, on which early stopping has an inherent effect. More precisely, we observe that transferability peaks when the learning rate decays, which is also the time at which the sharpness of the loss significantly drops. This leads us to propose RFN, a new approach for transferability that minimizes loss sharpness during training in order to maximize transferability. We show that by searching for large flat neighborhoods, RFN always improves over early stopping (by up to 47 points of transferability rate) and is competitive to (if not better than) strong state-of-the-art baselines.

翻译：对抗样本可转移性是指对抗样本被其它模型误分类的属性, 而非被它们针对的替代模型所正确分辨。先前研究表明，当替代模型的训练被提前停止时，该属性会显著提高。一个常见的假说是，模型在后期训练时学习的是对抗攻击利用的非鲁棒特征。因此，一个提前停止的模型比完全训练的模型更加鲁棒 (因此，是更好的替代模型)。我们证明了提前停止为什么能够提高可转移性的原因是在于它对模型的学习动态产生的副作用。我们首先展示了在替代模型学习带有非鲁棒特征的数据时，提前停止对可转移性的好处。然后，我们建立了可转移性和参数空间中损失景观的探索之间的联系。提前停止对此有固有的影响。更具体地，我们观察到当学习率衰减时，可转移性达到最高点，这也是损失的锐度显著下降的时间。这使我们提出了RFN，一种新的用于最大化可转移性的方法，其目标是在训练过程中最小化损失的锐度。我们证明了通过寻找大的平坦邻域，RFN 总是可以提高提前停止方法的效果 (可转移性率最高可提高 47 点)，并且与强有力的最新工作相比是竞争性的 (甚至更好)。

相关内容

对抗样本

关注 13

对抗样本由Christian Szegedy等人提出，是指在数据集中通过故意添加细微的干扰所形成的输入样本，导致模型以高置信度给出一个错误的输出。在正则化背景下，通过对抗训练减少原有独立同分布的测试集的错误率——在对抗扰动的训练集样本上训练网络。对抗样本是指通过在数据中故意添加细微的扰动生成的一种输入样本，能够导致神经网络模型给出一个错误的预测结果。实质：对抗样本是通过向输入中加入人类难以察觉的扰动生成，能够改变人工智能模型的行为。其基本目标有两个，一是改变模型的预测结果；二是加入到输入中的扰动在人类看起来不足以引起模型预测结果的改变，具有表面上的无害性。对抗样本的相关研究对自动驾驶、智能家居等应用场景具有非常重要的意义。

【NeurIPS 2022】EvenNet:忽略Odd-Hop邻居改善图神经网络的鲁棒性

专知会员服务

19+阅读 · 2022年11月15日

图神经网络黑盒攻击近期进展

专知会员服务

19+阅读 · 2022年10月14日

【KDD22】DICE: 域攻击不变的因果学习以保护数据隐私、提升攻击迁移性和对抗鲁棒性

专知会员服务

12+阅读 · 2022年8月27日

Google 发布82页《深度学习泛化性揭秘》综述论文，On the Generalization Mystery in Deep Learning

专知会员服务

61+阅读 · 2022年3月22日