The teacher-free online Knowledge Distillation (KD) aims to train an ensemble of multiple student models collaboratively and distill knowledge from each other. Although existing online KD methods achieve desirable performance, they often focus on class probabilities as the core knowledge type, ignoring the valuable feature representational information. We present a Mutual Contrastive Learning (MCL) framework for online KD. The core idea of MCL is to perform mutual interaction and transfer of contrastive distributions among a cohort of networks in an online manner. Our MCL can aggregate cross-network embedding information and maximize the lower bound to the mutual information between two networks. This enables each network to learn extra contrastive knowledge from others, leading to better feature representations, thus improving the performance of visual recognition tasks. Beyond the final layer, we extend MCL to intermediate layers and perform an adaptive layer-matching mechanism trained by meta-optimization. Experiments on image classification and transfer learning to visual recognition tasks show that layer-wise MCL can lead to consistent performance gains against state-of-the-art online KD approaches. The superiority demonstrates that layer-wise MCL can guide the network to generate better feature representations. Our code is publicly avaliable at https://github.com/winycg/L-MCL.
翻译:无需教师的在线知识蒸馏旨在通过多个学生模型的协作训练和相互之间的知识迁移来实现。尽管现有的在线知识蒸馏方法取得了令人满意的性能,但它们通常将类概率作为核心知识类型,忽略了有价值的特征表征信息。我们提出了一种用于在线知识蒸馏的相互对比学习(MCL)框架。MCL的核心思想是在一个网络群体中进行相互作用和对比分布传递。我们的MCL可以汇集跨网络的嵌入信息,并最大化两个网络之间的互信息下界。这使每个网络能够从其他网络中学习到额外的对比知识,从而改进特征表示,提高视觉识别任务的性能。除了最终层之外,我们还将MCL扩展到中间层,并进行通过元优化训练的自适应层匹配机制。图像分类和迁移学习到视觉识别任务的实验证明,层级MCL可以相对状态-of-the-art的在线知识蒸馏方法带来一致的性能提升。这种优越性表明,层级MCL可以指导网络生成更好的特征表示。我们的代码公开可用于https://github.com/winycg/L-MCL。