Knowledge Distillation (KD) transfers the knowledge from a high-capacity teacher model to promote a smaller student model. Existing efforts guide the distillation by matching their prediction logits, feature embedding, etc., while leaving how to efficiently utilize them in junction less explored. In this paper, we propose Hint-dynamic Knowledge Distillation, dubbed HKD, which excavates the knowledge from the teacher' s hints in a dynamic scheme. The guidance effect from the knowledge hints usually varies in different instances and learning stages, which motivates us to customize a specific hint-learning manner for each instance adaptively. Specifically, a meta-weight network is introduced to generate the instance-wise weight coefficients about knowledge hints in the perception of the dynamical learning progress of the student model. We further present a weight ensembling strategy to eliminate the potential bias of coefficient estimation by exploiting the historical statics. Experiments on standard benchmarks of CIFAR-100 and Tiny-ImageNet manifest that the proposed HKD well boost the effect of knowledge distillation tasks.
翻译:知识蒸馏(KD) 将知识从高能力教师模式中转移出来,以推广一个较小的学生模式。 现有的努力通过匹配他们的预测日志、特征嵌入等来指导蒸馏,同时将如何在较少探索的交汇点有效利用它们。 在本文中,我们提出“ 知识蒸馏(Hint-hird-hind-hilightstillation) ”, 将知识从教师的提示中挖掘出来。 知识提示的指导效应通常在不同的实例和学习阶段各不相同,这促使我们定制每个案例的具体提示学习方式。 具体地说, 引入了一个超重网络, 以生成关于学生模式动态学习进展感知觉中知识提示的根据实例加权系数。 我们进一步提出一种权重组合战略,通过利用历史静态来消除系数估计的潜在偏差。 对CIRA-100和Tiniy-ImageNet的标准基准进行实验, 表明拟议的HKHD能够很好地促进知识蒸馏任务的效果。