Knowledge distillation often involves how to define and transfer knowledge from teacher to student effectively. Although recent self-supervised contrastive knowledge achieves the best performance, forcing the network to learn such knowledge may damage the representation learning of the original class recognition task. We therefore adopt an alternative self-supervised augmented task to guide the network to learn the joint distribution of the original recognition task and self-supervised auxiliary task. It is demonstrated as a richer knowledge to improve the representation power without losing the normal classification capability. Moreover, it is incomplete that previous methods only transfer the probabilistic knowledge between the final layers. We propose to append several auxiliary classifiers to hierarchical intermediate feature maps to generate diverse self-supervised knowledge and perform the one-to-one transfer to teach the student network thoroughly. Our method significantly surpasses the previous SOTA SSKD with an average improvement of 2.56\% on CIFAR-100 and an improvement of 0.77\% on ImageNet across widely used network pairs. Codes are available at https://github.com/winycg/HSAKD.
翻译:虽然最近自我监督的对比性知识取得了最佳的成绩,但迫使网络学习这种知识可能会损害最初阶级承认任务的代表性学习,因此,我们采取了另一种自我监督的扩大任务,以指导网络学习共同分配最初的承认任务和自我监督的辅助任务,这证明是一种在不丧失正常分类能力的情况下提高代表性能力的更丰富的知识。此外,以往方法尚不完全,只是将概率性知识在最后层次之间转移。我们提议将若干辅助分类器附在等级中等特征地图上,以产生多样化的自我监督知识,并进行一对一的转让,以彻底教授学生网络。我们的方法大大超过以前的SOTA SSSKD, 在CIFAR-100上平均改进2.56 ⁇,在广泛使用的网络配对的图像网上改进0.77 ⁇ 。