Knowledge distillation (KD), as an efficient and effective model compression technique, has been receiving considerable attention in deep learning. The key to its success is to transfer knowledge from a large teacher network to a small student one. However, most of the existing knowledge distillation methods consider only one type of knowledge learned from either instance features or instance relations via a specific distillation strategy in teacher-student learning. There are few works that explore the idea of transferring different types of knowledge with different distillation strategies in a unified framework. Moreover, the frequently used offline distillation suffers from a limited learning capacity due to the fixed teacher-student architecture. In this paper we propose a collaborative teacher-student learning via multiple knowledge transfer (CTSL-MKT) that prompts both self-learning and collaborative learning. It allows multiple students learn knowledge from both individual instances and instance relations in a collaborative way. While learning from themselves with self-distillation, they can also guide each other via online distillation. The experiments and ablation studies on four image datasets demonstrate that the proposed CTSL-MKT significantly outperforms the state-of-the-art KD methods.
翻译:作为一种高效有效的模型压缩技术,知识蒸馏(KD)作为一种高效和有效的模型压缩技术,在深层学习中一直受到相当重视,其成功的关键在于将知识从一个大型教师网络向一个小型学生网络转移,然而,大多数现有的知识蒸馏方法只考虑通过教师-学生学习中具体的蒸馏战略从实例特征或实例关系中获取的一种知识。很少有工作探索以不同蒸馏战略在统一框架内转让不同类型知识的想法。此外,经常使用的离线蒸馏由于固定教师-学生结构的学习能力有限而受到影响。在本文中,我们提议通过多种知识转让(CTSL-MKT)进行教师-学生合作学习,促进自学和协作学习。它使多名学生能够从单个实例和实例关系中学习知识,同时通过协作方式学习。在自我蒸馏过程中学习,他们也可以通过在线蒸馏来相互指导。关于四个图像数据集的实验和对比研究表明,拟议的CTSTSL-MKT大大超越了KPert的状态方法。