Recent studies have shown that robustness to adversarial attacks can be transferred across networks. In other words, we can make a weak model more robust with the help of a strong teacher model. We ask if instead of learning from a static teacher, can models "learn together" and "teach each other" to achieve better robustness? In this paper, we study how interactions among models affect robustness via knowledge distillation. We propose mutual adversarial training (MAT), in which multiple models are trained together and share the knowledge of adversarial examples to achieve improved robustness. MAT allows robust models to explore a larger space of adversarial samples, and find more robust feature spaces and decision boundaries. Through extensive experiments on CIFAR-10 and CIFAR-100, we demonstrate that MAT can effectively improve model robustness and outperform state-of-the-art methods under white-box attacks, bringing $\sim$8% accuracy gain to vanilla adversarial training (AT) under PGD-100 attacks. In addition, we show that MAT can also mitigate the robustness trade-off among different perturbation types, bringing as much as 13.1% accuracy gain to AT baselines against the union of $l_\infty$, $l_2$ and $l_1$ attacks. These results show the superiority of the proposed method and demonstrate that collaborative learning is an effective strategy for designing robust models.
翻译:最近的研究显示,对对抗性攻击的稳健性可以在整个网络中转移。换句话说,我们可以在一个强大的教师模式的帮助下,使一个弱小的模式更加强大。我们问一下,如果不是从静态教师那里学习,我们能否用“一起学习”和“相互教育”的模式来提高稳健性?在本文件中,我们研究各种模式之间的相互作用如何通过知识蒸馏而影响稳健性。我们提议进行相互对抗性培训,在这种培训中,多种模式一起接受培训,并分享对抗性例子的知识,以提高稳健性。MAT还允许一种强健型模型探索更大的对抗性样本空间,并找到更强健的特征空间和决策界限。我们通过对CIRFAR-10和CIFAR-100的广泛实验,我们证明MAT能够有效地改进模式的稳健性和超常性方法,在PGD-100袭击中将8 %的精准性收益带到香拉对抗性对抗性训练。此外,我们表明,MAT还可以减少不同强性辩论性样本类型之间的稳健性贸易交易,并找到13.1美元攻击的精准性战略。