Knowledge distillation is an effective and stable method for model compression via knowledge transfer. Conventional knowledge distillation (KD) is to transfer knowledge from a large and well pre-trained teacher network to a small student network, which is a one-way process. Recently, deep mutual learning (DML) has been proposed to help student networks learn collaboratively and simultaneously. However, to the best of our knowledge, KD and DML have never been jointly explored in a unified framework to solve the knowledge distillation problem. In this paper, we investigate that the teacher model supports more trustworthy supervision signals in KD, while the student captures more similar behaviors from the teacher in DML. Based on these observations, we first propose to combine KD with DML in a unified framework. Furthermore, we propose a Semi-Online Knowledge Distillation (SOKD) method that effectively improves the performance of the student and the teacher. In this method, we introduce the peer-teaching training fashion in DML in order to alleviate the student's imitation difficulty, and also leverage the supervision signals provided by the well-trained teacher in KD. Besides, we also show our framework can be easily extended to feature-based distillation methods. Extensive experiments on CIFAR-100 and ImageNet datasets demonstrate the proposed method achieves state-of-the-art performance.
翻译:常规知识蒸馏(KD)是将知识从一个大型且训练有素的师资网络转移到一个小型学生网络,这是一个单向过程。最近,提出了深入的相互学习(DML),以帮助学生网络同时合作学习。然而,就我们的知识而言,KD和DML从未在统一框架内共同探索过解决知识蒸馏问题的方法。在本文中,我们调查教师模型支持KD中更值得信赖的监督信号,而学生则捕捉DML教师的类似行为。基于这些观察,我们首先建议在统一的框架内将KD和DML结合起来。此外,我们建议采用半Onal知识蒸馏(SOKD)方法,以有效提高学生和教师的绩效。在这种方法中,我们在DML中引入了同侪教学培训模式,以缓解学生的模仿困难,并利用KDML教师提供的监督信号。此外,我们还可以在KDML中进行良好培训的图像蒸馏过程中,我们还可以轻松地展示我们所提议采用的方法。