Existing online knowledge distillation approaches either adopt the student with the best performance or construct an ensemble model for better holistic performance. However, the former strategy ignores other students' information, while the latter increases the computational complexity. In this paper, we propose a novel method for online knowledge distillation, termed FFSD, which comprises two key components: Feature Fusion and Self-Distillation, towards solving the above problems in a unified framework. Different from previous works, where all students are treated equally, the proposed FFSD splits them into a student leader and a common student set. Then, the feature fusion module converts the concatenation of feature maps from all common students into a fused feature map. The fused representation is used to assist the learning of the student leader. To enable the student leader to absorb more diverse information, we design an enhancement strategy to increase the diversity among students. Besides, a self-distillation module is adopted to convert the feature map of deeper layers into a shallower one. Then, the shallower layers are encouraged to mimic the transformed feature maps of the deeper layers, which helps the students to generalize better. After training, we simply adopt the student leader, which achieves superior performance, over the common students, without increasing the storage or inference cost. Extensive experiments on CIFAR-100 and ImageNet demonstrate the superiority of our FFSD over existing works. The code is available at https://github.com/SJLeo/FFSD.
翻译:现有的在线知识蒸馏方法要么采用表现最佳的学生,要么为更好的整体业绩构建一个混合模型。然而,前一战略忽略了其他学生的信息,而后者则增加了计算的复杂性。在本文中,我们提出一种叫FFSD的在线知识蒸馏新颖方法,称为FFSD,它由两个关键部分组成:特色融合和自我蒸馏,目的是在统一的框架内解决上述问题。与以前所有学生都得到同等待遇的工程不同,拟议的FFSD将其分成一个学生领袖和一个普通学生组。然后,功能融合模块将所有普通学生的地貌地图转换成一个接合功能图,而后者则增加了计算功能的功能图。为了让学生领袖能够吸收更多样化的信息,我们设计了一个强化战略,以便在一个统一的框架内解决上述问题。此外,采用自我蒸馏模块,将更深层的SDSD代码图转换成一个更浅层/更浅层的学生组。然后,鼓励浅层层将更深层的地貌图转换成一个更深层图,帮助所有普通学生学习更高级的FAR,在不需上进行更高级的实验之后,我们更深入地展示。