Recent years have seen the ever-increasing importance of pre-trained models and their downstream training in deep learning research and applications. At the same time, the defense for adversarial examples has been mainly investigated in the context of training from random initialization on simple classification tasks. To better exploit the potential of pre-trained models in adversarial robustness, this paper focuses on the fine-tuning of an adversarially pre-trained model in various classification tasks. Existing research has shown that since the robust pre-trained model has already learned a robust feature extractor, the crucial question is how to maintain the robustness in the pre-trained model when learning the downstream task. We study the model-based and data-based approaches for this goal and find that the two common approaches cannot achieve the objective of improving both generalization and adversarial robustness. Thus, we propose a novel statistics-based approach, Two-WIng NormliSation (TWINS) fine-tuning framework, which consists of two neural networks where one of them keeps the population means and variances of pre-training data in the batch normalization layers. Besides the robust information transfer, TWINS increases the effective learning rate without hurting the training stability since the relationship between a weight norm and its gradient norm in standard batch normalization layer is broken, resulting in a faster escape from the sub-optimal initialization and alleviating the robust overfitting. Finally, TWINS is shown to be effective on a wide range of image classification datasets in terms of both generalization and robustness. Our code is available at https://github.com/ziquanliu/CVPR2023-TWINS.
翻译:TWINS: 一种用于提高对抗鲁棒性和泛化性能的Fine-Tuning框架
近年来,预训练模型及其下游训练在深度学习研究和应用中的重要性不断增加。同时,防御对抗样本在训练模型时一直以来都是在随机初始化的情况下进行的。为了更好地利用预训练模型在对抗鲁棒性方面的潜力,本文着重研究在不同分类任务中对经过对抗训练的模型进行Fine-Tuning的方法。现有的研究表明,由于强健的预训练模型已经学习到了一个强健的特征提取器,关键问题是如何在学习下游任务时保持预训练模型的鲁棒性。我们研究了基于模型和基于数据的方法并发现这两种常见方法不能同时提高泛化性能和鲁棒性。因此,我们提出了一种新颖的基于统计的方法——Two-WIng NormliSation (TWINS) Fine-Tuning框架,它由两个神经网络组成,其中一个神经网络在批归一化层中保持预训练数据的总体均值和方差。除了强健信息传递之外,TWINS还增加了有效学习速率,而不会影响训练的稳定性,因为标准批归一化层中权重范数和梯度范数之间的关系被打破,从而实现了更快地摆脱次优初始化和减缓过度拟合的目的。最后,我们证明了TWINS在广泛的图像分类数据集中在泛化性能和鲁棒性方面都很有效。我们的代码可在https://github.com/ziquanliu/CVPR2023-TWINS 上找到。