We present Meta Learning for Knowledge Distillation (MetaDistil), a simple yet effective alternative to traditional knowledge distillation (KD) methods where the teacher model is fixed during training. We show the teacher network can learn to better transfer knowledge to the student network (i.e., learning to teach) with the feedback from the performance of the distilled student network in a meta learning framework. Moreover, we introduce a pilot update mechanism to improve the alignment between the inner-learner and meta-learner in meta learning algorithms that focus on an improved inner-learner. Experiments on various benchmarks show that MetaDistil can yield significant improvements compared with traditional KD algorithms and is less sensitive to the choice of different student capacity and hyperparameters, facilitating the use of KD on different tasks and models. The code is available at https://github.com/JetRunner/MetaDistil
翻译:我们介绍了“Meta Learning for knownstillation”(MetaDistil),这是在培训期间固定教师模式的传统知识蒸馏(KD)方法的一种简单而有效的替代方法,我们通过一个元学习框架,从被蒸馏的学生网络的绩效反馈中显示,教师网络可以学习将知识更好地传授给学生网络(即学习教书)。此外,我们引入一个试点更新机制,以改善内利耳和元利耳纳在元学习算法中的一致性,侧重于改进的内利耳纳。各种基准实验表明,Meta Distil与传统的KD算法相比,可以产生显著的改进,对选择不同学生能力和超立方计不那么敏感,便于在不同任务和模型中使用KD。该代码可在https://github.com/JetRunner/MetaDistatil查阅。