Adversarial training is the most promising method for learning robust models against adversarial examples. A recent study has shown that knowledge distillation between the same architectures is effective in improving the performance of adversarial training. Exploiting knowledge distillation is a new approach to improve adversarial training and has attracted much attention. However, its performance is still insufficient. Therefore, we propose Adversarial Robust Distillation with Internal Representation~(ARDIR) to utilize knowledge distillation even more effectively. In addition to the output of the teacher model, ARDIR uses the internal representation of the teacher model as a label for adversarial training. This enables the student model to be trained with richer, more informative labels. As a result, ARDIR can learn more robust student models. We show that ARDIR outperforms previous methods in our experiments.
翻译:最近的一项研究表明,同一结构之间的知识蒸馏对于改善对抗性培训的绩效是有效的。利用知识蒸馏是改进对抗性培训的一种新方法,吸引了人们的极大关注。然而,其绩效仍然不足。因此,我们提议用内部代表制(ARDIR)进行对立性强力蒸馏,以便更有效地利用知识蒸馏。除了教师模型的产出外,ARDIR还利用教师模型的内部代表性作为对抗性培训的标签。这使得学生模型能够以更丰富、更丰富的信息标签来接受培训。因此,ARDIR可以学习更强大的学生模型。我们证明ARDIR在实验中超越了以往的方法。