Transferability is the property of adversarial examples to be misclassified by other models than the surrogate model for which they were crafted. Previous research has shown that transferability is substantially increased when the training of the surrogate model has been early stopped. A common hypothesis to explain this is that the later training epochs are when models learn the non-robust features that adversarial attacks exploit. Hence, an early stopped model is more robust (hence, a better surrogate) than fully trained models. We demonstrate that the reasons why early stopping improves transferability lie in the side effects it has on the learning dynamics of the model. We first show that early stopping benefits transferability even on models learning from data with non-robust features. We then establish links between transferability and the exploration of the loss landscape in the parameter space, on which early stopping has an inherent effect. More precisely, we observe that transferability peaks when the learning rate decays, which is also the time at which the sharpness of the loss significantly drops. This leads us to propose RFN, a new approach for transferability that minimizes loss sharpness during training in order to maximize transferability. We show that by searching for large flat neighborhoods, RFN always improves over early stopping (by up to 47 points of transferability rate) and is competitive to (if not better than) strong state-of-the-art baselines.
翻译:对抗样本可转移性是指对抗样本被其它模型误分类的属性, 而非被它们针对的替代模型所正确分辨。先前研究表明,当替代模型的训练被提前停止时,该属性会显著提高。一个常见的假说是,模型在后期训练时学习的是对抗攻击利用的非鲁棒特征。因此,一个提前停止的模型比完全训练的模型更加鲁棒 (因此,是更好的替代模型)。我们证明了提前停止为什么能够提高可转移性的原因是在于它对模型的学习动态产生的副作用。我们首先展示了在替代模型学习带有非鲁棒特征的数据时,提前停止对可转移性的好处。然后,我们建立了可转移性和参数空间中损失景观的探索之间的联系。提前停止对此有固有的影响。更具体地,我们观察到当学习率衰减时,可转移性达到最高点,这也是损失的锐度显著下降的时间。这使我们提出了RFN,一种新的用于最大化可转移性的方法,其目标是在训练过程中最小化损失的锐度。我们证明了通过寻找大的平坦邻域,RFN 总是可以提高提前停止方法的效果 (可转移性率最高可提高 47 点),并且与强有力的最新工作相比是竞争性的 (甚至更好)。