重新研究知识蒸馏:继承和探索框架 (Revisiting Knowledge Distillation: An Inheritance and Exploration Framework)

Knowledge Distillation (KD) is a popular technique to transfer knowledge from a teacher model or ensemble to a student model. Its success is generally attributed to the privileged information on similarities/consistency between the class distributions or intermediate feature representations of the teacher model and the student model. However, directly pushing the student model to mimic the probabilities/features of the teacher model to a large extent limits the student model in learning undiscovered knowledge/features. In this paper, we propose a novel inheritance and exploration knowledge distillation framework (IE-KD), in which a student model is split into two parts - inheritance and exploration. The inheritance part is learned with a similarity loss to transfer the existing learned knowledge from the teacher model to the student model, while the exploration part is encouraged to learn representations different from the inherited ones with a dis-similarity loss. Our IE-KD framework is generic and can be easily combined with existing distillation or mutual learning methods for training deep neural networks. Extensive experiments demonstrate that these two parts can jointly push the student model to learn more diversified and effective representations, and our IE-KD can be a general technique to improve the student network to achieve SOTA performance. Furthermore, by applying our IE-KD to the training of two networks, the performance of both can be improved w.r.t. deep mutual learning. The code and models of IE-KD will be make publicly available at https://github.com/yellowtownhz/IE-KD.

翻译：知识蒸馏(KD)是一种将知识从教师模式或组合式传授给学生模式的流行技术,其成功一般归功于关于教师模式和学生模式的班级分布或中间特征表达方式之间的异同/一致性的优异信息。然而,直接推动学生模式模仿教师模式的概率/性能,在很大程度上限制了学生模式学习未发现的知识/性能。在本文中,我们提出一个新的继承和探索知识蒸馏框架(IE-KD),其中将学生模式分为两个部分:继承和探索。继承部分学习了类似性损失,将现有的从教师模式学到的知识传授给学生模式,同时鼓励学生模式学习与继承模式不同的概率/性能。我们的IE-KD框架是通用的,可以很容易地与现有的蒸馏或相互学习方法相结合,用于培训深层神经网络。广泛实验表明这两个部分可以共同推动学生模式学习更加多样化和有效的方式,在SOE-K-K网络中学习共同的成绩,可以改进我们I-K-K-D网络。

相关内容

MoDELS

关注 43

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/