Knowledge distillation (KD) is an effective framework that aims to transfer meaningful information from a large teacher to a smaller student. Generally, KD often involves how to define and transfer knowledge. Previous KD methods often focus on mining various forms of knowledge, for example, feature maps and refined information. However, the knowledge is derived from the primary supervised task and thus is highly task-specific. Motivated by the recent success of self-supervised representation learning, we propose an auxiliary self-supervision augmented task to guide networks to learn more meaningful features. Therefore, we can derive soft self-supervision augmented distributions as richer dark knowledge from this task for KD. Unlike previous knowledge, this distribution encodes joint knowledge from supervised and self-supervised feature learning. Beyond knowledge exploration, another crucial aspect is how to learn and distill our proposed knowledge effectively. To fully take advantage of hierarchical feature maps, we propose to append several auxiliary branches at various hidden layers. Each auxiliary branch is guided to learn self-supervision augmented task and distill this distribution from teacher to student. Thus we call our KD method as Hierarchical Self-Supervision Augmented Knowledge Distillation (HSSAKD). Experiments on standard image classification show that both offline and online HSSAKD achieves state-of-the-art performance in the field of KD. Further transfer experiments on object detection further verify that HSSAKD can guide the network to learn better features, which can be attributed to learn and distill an auxiliary self-supervision augmented task effectively.
翻译:知识蒸馏(KD)是一个有效的框架,旨在将有意义的信息从一个大教师传递给一个较小的学生。 一般来说, KD经常涉及如何定义和转让知识。 以前的 KD 方法往往侧重于挖掘各种形式的知识, 例如, 地貌地图和精炼的信息。 但是, 知识来自初级监督任务, 因而是高度任务性能的。 由于最近自我监督的代表学习的成功, 我们提议了一个辅助性自我监督强化任务, 以指导网络学习更有意义的特征。 因此, 我们可以从KD的任务中获取软性自我监督扩大的分布, 作为更深的暗点知识。 与以往的自定义知识不同, 这种分配将来自受监督和自我监督的特性学习的联合知识编码。 除了知识探索之外, 另一个关键方面是如何有效地学习和提炼我们拟议的知识。 为了充分利用等级性特征图, 我们提议在各种隐藏的层次上附加几个辅助性分支。 每个辅助性分支都接受指导,学习自我监督任务扩大任务,并将这种分配从教师到学生。 因此, 我们称我们的 KD 方法, 是高级自译自定义 自我监督的自我检查和自我监督性搜索系统, 可以在在线的实验中, 升级的升级的系统, 升级的校外演化的实验, 可以在SALGLVALVD 上, 级的升级的校校校校校校校校 级的校外的校外的校外演化的校外演制, 。