Unlike the conventional Knowledge Distillation (KD), Self-KD allows a network to learn knowledge from itself without any guidance from extra networks. This paper proposes to perform Self-KD from image Mixture (MixSKD), which integrates these two techniques into a unified framework. MixSKD mutually distills feature maps and probability distributions between the random pair of original images and their mixup images in a meaningful way. Therefore, it guides the network to learn cross-image knowledge by modelling supervisory signals from mixup images. Moreover, we construct a self-teacher network by aggregating multi-stage feature maps for providing soft labels to supervise the backbone classifier, further improving the efficacy of self-boosting. Experiments on image classification and transfer learning to object detection and semantic segmentation demonstrate that MixSKD outperforms other state-of-the-art Self-KD and data augmentation methods. The code is available at https://github.com/winycg/Self-KD-Lib.
翻译:与传统知识蒸馏法(KD)不同,自KD允许一个网络在不从额外网络得到任何指导的情况下学习知识。本文建议从图像混合法(MixSKD)中执行自KD(MixSKD),将这两种技术整合到一个统一的框架中。 MixSKD 以有意义的方式相互蒸馏原始图像随机配对及其混合图像之间的特征地图和概率分布。因此,它指导网络通过模拟混合图像的监督信号来学习交叉图像知识。此外,我们通过集成多级功能地图来提供软标签以监督主干分类器,进一步提高自我启动的功效。关于图像分类的实验和将学习转移到对象检测和语义分割法的实验表明,MixSKD 超越了其他状态的艺术自KD和数据增强方法。该代码可在 https://github.com/winycg/self-KD-Lib查阅。