Self-supervised learning (SSL) has made remarkable progress in visual representation learning. Some studies combine SSL with knowledge distillation (SSL-KD) to boost the representation learning performance of small models. In this study, we propose a Multi-mode Online Knowledge Distillation method (MOKD) to boost self-supervised visual representation learning. Different from existing SSL-KD methods that transfer knowledge from a static pre-trained teacher to a student, in MOKD, two different models learn collaboratively in a self-supervised manner. Specifically, MOKD consists of two distillation modes: self-distillation and cross-distillation modes. Among them, self-distillation performs self-supervised learning for each model independently, while cross-distillation realizes knowledge interaction between different models. In cross-distillation, a cross-attention feature search strategy is proposed to enhance the semantic feature alignment between different models. As a result, the two models can absorb knowledge from each other to boost their representation learning performance. Extensive experimental results on different backbones and datasets demonstrate that two heterogeneous models can benefit from MOKD and outperform their independently trained baseline. In addition, MOKD also outperforms existing SSL-KD methods for both the student and teacher models.
翻译:自监督学习(SSL)已经在视觉表示学习中取得重要进展。一些研究将SSL与知识蒸馏(SSL-KD)相结合,以提高小模型的表示学习性能。在本研究中,我们提出了一种多模式在线知识蒸馏方法(MOKD)来提高自监督视觉表示学习。与现有的从静态预训练的teacher转移知识到student的SSL-KD方法不同,MOKD中有两个不同的模型在自监督方式下协作学习。具体地,MOKD包括两个蒸馏模式:自蒸馏和交叉蒸馏模式。其中,自蒸馏对每个模型进行独立的自监督学习,而交叉蒸馏实现不同模型之间的知识交互。在交叉蒸馏中,提出了一种交叉注意力特征搜索策略,以增强不同模型之间的语义特征对齐。因此,两个模型可以相互吸收知识,以提高它们的表示学习性能。在不同的backbones和数据集上进行的广泛实验结果表明,两个异构模型都可以从MOKD中受益,并且优于独立训练的基线。此外,MOKD也优于现有的SSL-KD方法,适用于student和teacher模型的训练。