Recent self-supervision methods have found success in learning feature representations that could rival ones from full supervision, and have been shown to be beneficial to the model in several ways: for example improving models robustness and out-of-distribution detection. In our paper, we conduct an empirical study to understand more precisely in what way can self-supervised learning - as a pre-training technique or part of adversarial training - affects model robustness to $l_2$ and $l_{\infty}$ adversarial perturbations and natural image corruptions. Self-supervision can indeed improve model robustness, however it turns out the devil is in the details. If one simply adds self-supervision loss in tandem with adversarial training, then one sees improvement in accuracy of the model when evaluated with adversarial perturbations smaller or comparable to the value of $\epsilon_{train}$ that the robust model is trained with. However, if one observes the accuracy for $\epsilon_{test} \ge \epsilon_{train}$, the model accuracy drops. In fact, the larger the weight of the supervision loss, the larger the drop in performance, i.e. harming the robustness of the model. We identify primary ways in which self-supervision can be added to adversarial training, and observe that using a self-supervised loss to optimize both network parameters and find adversarial examples leads to the strongest improvement in model robustness, as this can be viewed as a form of ensemble adversarial training. Although self-supervised pre-training yields benefits in improving adversarial training as compared to random weight initialization, we observe no benefit in model robustness or accuracy if self-supervision is incorporated into adversarial training.
翻译:最近的自我监督方法在学习特征表现上取得了成功,与完全监督的相似,并且以多种方式证明对模型有益:例如,改进模型的稳健性和超出分配的检测。在我们的论文中,我们进行了一项实证研究,以更准确地理解自我监督学习如何能影响模型的稳健性,作为培训前技术或对抗性培训的一部分,从而影响强健性至$2美元和$l ⁇ infty}$的对抗性扰动和自然图像腐败。自我监督确实可以改善模型的稳健性,然而,它证明魔鬼在细节中是如此。如果简单地在对抗性培训的同时增加自我监督损失,那么当以对抗性干扰的较小或与强势模式培训的价值相当时,我们就会看到模型的准确性能,如果发现 $\epluslon ⁇ test}\egeeplisluslectration 和自然图像最强性腐败,那么模型的准确性将下降, 但它在激烈性培训中的自我监督的重度在初始性培训中会提高自我监督性, 在自我评估中可以确认自我评估的自我损失的深度中, 在自我评估中可以提高自我损失的自我评估中, 在自我评估中可以确认的自我损失的深度中, 在自我评估中可以增加自我评估性能的自我评估性能,在自我评估中可以提高性能中可以提高性能中,在自我评估性能中可以增加自我损失的方式, 。