Deploying deep neural networks on hardware with limited resources, such as smartphones and drones, constitutes a great challenge due to their computational complexity. Knowledge distillation approaches aim at transferring knowledge from a large model to a lightweight one, also known as teacher and student respectively, while distilling the knowledge from intermediate layers provides an additional supervision to that task. The capacity gap between the models, the information encoding that collapses its architectural alignment, and the absence of appropriate learning schemes for transferring multiple layers restrict the performance of existing methods. In this paper, we propose a novel method, termed InDistill, that can drastically improve the performance of existing single-layer knowledge distillation methods by leveraging the properties of channel pruning to both reduce the capacity gap between the models and retain the architectural alignment. Furthermore, we propose a curriculum learning based scheme for enhancing the effectiveness of transferring knowledge from multiple intermediate layers. The proposed method surpasses state-of-the-art performance on three benchmark image datasets.
翻译:在智能手机和无人驾驶飞机等资源有限的硬件上部署深层神经网络是一项巨大的挑战,因为它们的计算复杂性。知识蒸馏方法旨在将知识从一个大型模型转移到一个轻量级模型,也分别称为教师和学生,同时从中间层提取知识为这项任务提供了额外的监督。模型之间的能力差距、破坏其建筑结构一致性的信息编码以及缺乏转让多层的适当学习计划限制了现有方法的性能。在本文中,我们提出了一个新颖的方法,称为InDistill,它可以通过利用频道切割功能来大幅改善现有单层知识蒸馏方法的性能,既缩小模型之间的能力差距,又保持建筑上的一致。此外,我们提议了一个基于课程学习的计划,以提高从多个中间层转让知识的有效性。拟议方法超过了三个基准数据集的状态。