Knowledge Distillation has shown very promising abil-ity in transferring learned representation from the largermodel (teacher) to the smaller one (student).Despitemany efforts, prior methods ignore the important role ofretaining inter-channel correlation of features, leading tothe lack of capturing intrinsic distribution of the featurespace and sufficient diversity properties of features in theteacher network.To solve the issue, we propose thenovel Inter-Channel Correlation for Knowledge Distillation(ICKD), with which the diversity and homology of the fea-ture space of the student network can align with that ofthe teacher network. The correlation between these twochannels is interpreted as diversity if they are irrelevantto each other, otherwise homology. Then the student isrequired to mimic the correlation within its own embed-ding space. In addition, we introduce the grid-level inter-channel correlation, making it capable of dense predictiontasks. Extensive experiments on two vision tasks, includ-ing ImageNet classification and Pascal VOC segmentation,demonstrate the superiority of our ICKD, which consis-tently outperforms many existing methods, advancing thestate-of-the-art in the fields of Knowledge Distillation. Toour knowledge, we are the first method based on knowl-edge distillation boosts ResNet18 beyond 72% Top-1 ac-curacy on ImageNet classification. Code is available at:https://github.com/ADLab-AutoDrive/ICKD.
翻译:知识蒸馏在将学习的表述从较大的模型(教师)转移到较小的模型(学生)方面显示了非常有希望的说服力。 仔细观察,先前的方法忽略了保持各功能之间相互关联的重要作用,导致缺乏特性空间的内在分布和教师网络特性的足够多样性特性。 为了解决这个问题,我们提议采用网络间知识蒸馏的相互交错,让学生网络的发泡空间的多样性和同质性与教师网络的分类相一致。这两个通道之间的关联被解释为多样性,如果它们彼此无关,否则就是同质。然后学生必须在其嵌入空间内模拟这种相关性。此外,我们建议采用网络一级的网络间关联,使其能够进行密集的预测塔。在两种视觉任务上进行广泛的实验,即图像网络分类和Pascal VOC分解,演示我们的 RickD 的优越性,这是在网络- 图像网络上的可调和性,这是在图像- 图像- 流流流流学上的现有方法, 正在升级的升级- 正在升级- 更新- 数据流- 以现有方法为基础。