Knowledge Distillation (KD) is a popular technique to transfer knowledge from a teacher model or ensemble to a student model. Its success is generally attributed to the privileged information on similarities/consistency between the class distributions or intermediate feature representations of the teacher model and the student model. However, directly pushing the student model to mimic the probabilities/features of the teacher model to a large extent limits the student model in learning undiscovered knowledge/features. In this paper, we propose a novel inheritance and exploration knowledge distillation framework (IE-KD), in which a student model is split into two parts - inheritance and exploration. The inheritance part is learned with a similarity loss to transfer the existing learned knowledge from the teacher model to the student model, while the exploration part is encouraged to learn representations different from the inherited ones with a dis-similarity loss. Our IE-KD framework is generic and can be easily combined with existing distillation or mutual learning methods for training deep neural networks. Extensive experiments demonstrate that these two parts can jointly push the student model to learn more diversified and effective representations, and our IE-KD can be a general technique to improve the student network to achieve SOTA performance. Furthermore, by applying our IE-KD to the training of two networks, the performance of both can be improved w.r.t. deep mutual learning. The code and models of IE-KD will be make publicly available at https://github.com/yellowtownhz/IE-KD.
翻译:知识蒸馏(KD)是一种将知识从教师模式或组合式传授给学生模式的流行技术,其成功一般归功于关于教师模式和学生模式的班级分布或中间特征表达方式之间的异同/一致性的优异信息。然而,直接推动学生模式模仿教师模式的概率/性能,在很大程度上限制了学生模式学习未发现的知识/性能。在本文中,我们提出一个新的继承和探索知识蒸馏框架(IE-KD),其中将学生模式分为两个部分:继承和探索。继承部分学习了类似性损失,将现有的从教师模式学到的知识传授给学生模式,同时鼓励学生模式学习与继承模式不同的概率/性能。我们的IE-KD框架是通用的,可以很容易地与现有的蒸馏或相互学习方法相结合,用于培训深层神经网络。广泛实验表明这两个部分可以共同推动学生模式学习更加多样化和有效的方式,在SOE-K-K网络中学习共同的成绩,可以改进我们I-K-K-D网络。