交互式知识蒸馏 (Interactive Knowledge Distillation)

from arxiv, This work (IAKD) and BERT-of-Theseus (see arXiv:2002.02925) are concurrent works. IAKD was first submitted to CVPR2020 and now is accepted in Neurocomputing. Thank the authors in BERT-of-Theseus for pointing out this issue

Knowledge distillation is a standard teacher-student learning framework to train a light-weight student network under the guidance of a well-trained large teacher network. As an effective teaching strategy, interactive teaching has been widely employed at school to motivate students, in which teachers not only provide knowledge but also give constructive feedback to students upon their responses, to improve their learning performance. In this work, we propose an InterActive Knowledge Distillation (IAKD) scheme to leverage the interactive teaching strategy for efficient knowledge distillation. In the distillation process, the interaction between teacher and student networks is implemented by a swapping-in operation: randomly replacing the blocks in the student network with the corresponding blocks in the teacher network. In the way, we directly involve the teacher's powerful feature transformation ability to largely boost the student's performance. Experiments with typical settings of teacher-student networks demonstrate that the student networks trained by our IAKD achieve better performance than those trained by conventional knowledge distillation methods on diverse image classification datasets.

翻译：知识蒸馏是一种标准的师生学习框架,用于在一个训练有素的大型教师网络的指导下培训轻量级学生网络。作为一种有效的教学战略,在学校广泛采用交互式教学,以激励学生,教师不仅提供知识,而且根据学生的反馈提供建设性的反馈,以提高他们的学习成绩。在这项工作中,我们提议了一个互动知识蒸馏(IAKD)计划,以利用互动教学战略,提高知识蒸馏效率。在蒸馏过程中,教师与学生网络之间的互动通过互换操作实施:随机取代学生网络中的区块,代之以教师网络中的相应区块。在这种方式中,我们直接吸收教师强大的特征改造能力,以在很大程度上提高学生的成绩。在典型的师生网络环境中进行的实验表明,我们的IAKD所培训的学生网络比通过不同图像分类数据集的常规知识蒸馏方法培训的学生网络取得更好的业绩。