Previous adversarial training raises model robustness under the compromise of accuracy on natural data. In this paper, we reduce natural accuracy degradation. We use the model logits from one clean model to guide learning of another one robust model, taking into consideration that logits from the well trained clean model embed the most discriminative features of natural data, {\it e.g.}, generalizable classifier boundary. Our solution is to constrain logits from the robust model that takes adversarial examples as input and makes it similar to those from the clean model fed with corresponding natural data. It lets the robust model inherit the classifier boundary of the clean model. Moreover, we observe such boundary guidance can not only preserve high natural accuracy but also benefit model robustness, which gives new insights and facilitates progress for the adversarial community. Finally, extensive experiments on CIFAR-10, CIFAR-100, and Tiny ImageNet testify to the effectiveness of our method. We achieve new state-of-the-art robustness on CIFAR-100 without additional real or synthetic data with auto-attack benchmark \footnote{\url{https://github.com/fra31/auto-attack}}. Our code is available at \url{https://github.com/dvlab-research/LBGAT}.
翻译:先前的对抗性培训在自然数据精确度的折中提高了模型的稳健性; 在本文中,我们减少了自然精确度的退化; 我们使用一个清洁模型的模型记录来指导学习另一个稳健模型,同时考虑到经过良好训练的清洁模型的日志包含了自然数据最有区别的特点,例如:......一般可分类的边界; 我们的解决方案是限制强健模型的登录,该模型以对抗性实例作为输入,使其与以相应的自然数据提供的清洁模型的清洁模型相似; 它让强健模型继承清洁模型的分类界限; 此外,我们观察这种边界指南不仅能够保持高自然精确度,而且能够带来惠益模型的稳健性,为对抗性社区带来新的洞察力和促进进步。 最后,关于CIFAR-10、CIFAR-100和Tiny图像网的广泛实验证明了我们的方法的有效性。 我们在CFAR-100上实现了新的状态-最先进的稳健性,而没有使用自动攻击基准的、真实或合成数据{https://github.com/fra31/autoal_qurmastraction{我们的代码在http://Gurb_Brmaqur_qurmaqur_qur_qurmaqur=qur=