Adversarial training and its variants have become de facto standards for learning robust deep neural networks. In this paper, we explore the landscape around adversarial training in a bid to uncover its limits. We systematically study the effect of different training losses, model sizes, activation functions, the addition of unlabeled data (through pseudo-labeling) and other factors on adversarial robustness. We discover that it is possible to train robust models that go well beyond state-of-the-art results by combining larger models, Swish/SiLU activations and model weight averaging. We demonstrate large improvements on CIFAR-10 and CIFAR-100 against $\ell_\infty$ and $\ell_2$ norm-bounded perturbations of size $8/255$ and $128/255$, respectively. In the setting with additional unlabeled data, we obtain an accuracy under attack of 65.88% against $\ell_\infty$ perturbations of size $8/255$ on CIFAR-10 (+6.35% with respect to prior art). Without additional data, we obtain an accuracy under attack of 57.20% (+3.46%). To test the generality of our findings and without any additional modifications, we obtain an accuracy under attack of 80.53% (+7.62%) against $\ell_2$ perturbations of size $128/255$ on CIFAR-10, and of 36.88% (+8.46%) against $\ell_\infty$ perturbations of size $8/255$ on CIFAR-100. All models are available at https://github.com/deepmind/deepmind-research/tree/master/adversarial_robustness.
翻译:Adversarial 培训及其变体已成为学习强健的深神经网络的实际标准。 在本文中,我们探索了对抗性培训周围的景象,以探明其局限性。我们系统地研究各种培训损失、模型大小、激活功能、增加无标签数据(假标签)和其他因素对对抗性强力的影响。我们发现,有可能通过将大型模型、Swish/SiLU启动和平均模型重量合并,来培训远远超过最先进结果的强型模型。我们展示了在CIFAR-10和CIFAR-100方面的巨大改进,以达到美元=8/2美元和2美元标准幅度,分别为8/255美元和128/255美元。在附加无标签数据的情况下,我们得到了65.88%的准确度,而以美元/美元为美元,以8/255美元为单位,以美元/10美元为单位,以美元/10美元为单位(与前款为+6.35%)。我们没有获得更多数据,我们获得了对57.20美元(美元)的精确度(+3.46美元)的精确度的精确度。 测试了50 %/258美元/258美元/10的精确度。