Knowledge distillation is a method of transferring the knowledge from a pretrained complex teacher model to a student model, so a smaller network can replace a large teacher network at the deployment stage. To reduce the necessity of training a large teacher model, the recent literatures introduced a self-knowledge distillation, which trains a student network progressively to distill its own knowledge without a pretrained teacher network. While Self-knowledge distillation is largely divided into a data augmentation based approach and an auxiliary network based approach, the data augmentation approach looses its local information in the augmentation process, which hinders its applicability to diverse vision tasks, such as semantic segmentation. Moreover, these knowledge distillation approaches do not receive the refined feature maps, which are prevalent in the object detection and semantic segmentation community. This paper proposes a novel self-knowledge distillation method, Feature Refinement via Self-Knowledge Distillation (FRSKD), which utilizes an auxiliary self-teacher network to transfer a refined knowledge for the classifier network. Our proposed method, FRSKD, can utilize both soft label and feature-map distillations for the self-knowledge distillation. Therefore, FRSKD can be applied to classification, and semantic segmentation, which emphasize preserving the local information. We demonstrate the effectiveness of FRSKD by enumerating its performance improvements in diverse tasks and benchmark datasets. The implemented code is available at https://github.com/MingiJi/FRSKD.
翻译:知识蒸馏是一种将知识从经过预先训练的复杂教师模式向学生模式转移的方法,因此一个较小的网络可以在部署阶段取代一个大型教师网络。为了减少培训大型教师模式的必要性,最近的一些文献采用了一种自学蒸馏法,这种自学蒸馏法对学生网络进行逐步培训,在没有经过训练的教师网络的情况下蒸馏自己的知识。虽然自学蒸馏法基本上分为一种基于数据增强的方法和基于辅助网络的方法,但数据增强办法在增强过程中松散了它的地方信息,从而妨碍了它适用于诸如语义分割等多种视觉任务。此外,这些知识蒸馏法方法并不接收精细化的地貌图,而这种图在物体探测和语义分割社区很普遍。本文提出了一种新的自学蒸馏方法,即通过自学蒸馏蒸馏法精炼法,利用辅助的自我教师网络为分类网络转让精细的知识。我们拟议的方法,FRSKDD,可以使用软的标签和地格-D蒸馏方法,在自研磨的本地数据分类中,可以应用S/Sli/SliSmamamamaint maint maintmaint maint lagistrisal 。