In ordinary distillation, student networks are trained with soft labels (SLs) given by pretrained teacher networks, and students are expected to improve upon teachers since SLs are stronger supervision than the original hard labels. However, when considering adversarial robustness, teachers may become unreliable and adversarial distillation may not work: teachers are pretrained on their own adversarial data, and it is too demanding to require that teachers are also good at every adversarial data queried by students. Therefore, in this paper, we propose reliable introspective adversarial distillation (IAD) where students partially instead of fully trust their teachers. Specifically, IAD distinguishes between three cases given a query of a natural data (ND) and the corresponding adversarial data (AD): (a) if a teacher is good at AD, its SL is fully trusted; (b) if a teacher is good at ND but not AD, its SL is partially trusted and the student also takes its own SL into account; (c) otherwise, the student only relies on its own SL. Experiments demonstrate the effectiveness of IAD for improving upon teachers in terms of adversarial robustness.
翻译:在普通蒸馏过程中,学生网络通过经过培训的教师网络得到软标签(SLs)培训,学生预期教师会得到改进,因为SLs比原始硬标签更有监督力;然而,考虑到对抗性强,教师可能变得不可靠,而对抗性蒸馏可能行不通:教师根据自己的对抗性数据接受了预先培训,要求教师对学生询问的每一项对抗性数据也都很好;因此,在本文中,我们提出可靠的反省性对抗性蒸馏法(IAD),学生不完全信任教师,而部分地相信他们的教师。 具体地说,IAD对三种案例作了区分,对自然数据(ND)和相应的对抗性数据(AD)进行了查询:(a) 如果教师在AD表现良好,那么其SL完全可信;(b) 如果教师在ND成绩良好,其SL是部分信任的,学生也考虑到自己的SL;(c) 否则,学生只依靠自己的SL. 实验表明IAD在对抗性方面对教师的改进的有效性。