Self-supervised methods in vision have been mostly focused on large architectures as they seem to suffer from a significant performance drop for smaller architectures. In this paper, we propose a simple self-supervised distillation technique that can train high performance low-compute neural networks. Our main insight is that existing joint-embedding based SSL methods can be repurposed for knowledge distillation from a large self-supervised teacher to a small student model. Thus, we call our method Replace one Branch (RoB) as it simply replaces one branch of the joint-embedding training with a large teacher model. RoB is widely applicable to a number of architectures such as small ResNets, MobileNets and ViT, and pretrained models such as DINO, SwAV or iBOT. When pretraining on the ImageNet dataset, RoB yields models that compete with supervised knowledge distillation. When applied to MSN, RoB produces students with strong semi-supervised capabilities. Finally, our best ViT-Tiny models improve over prior SSL state-of-the-art on ImageNet by $2.3\%$ and are on par or better than a supervised distilled DeiT on five downstream transfer tasks (iNaturalist, CIFAR, Clevr/Count, Clevr/Dist and Places). We hope RoB enables practical self-supervision at smaller scale.
翻译:视觉中的自监督方法大多侧重于大型建筑, 因为它们似乎在较小的建筑的性能显著下降。 在本文中, 我们提出一个简单的自监督蒸馏技术, 能够训练高性能低compulate神经网络。 我们的主要见解是, 现有的基于联合组装的 SSL 方法可以重新用于知识蒸馏, 从一个大型自监督的教师转变为一个小型学生模型。 因此, 我们称我们的方法 替换一个分支( ROB ), 因为它只是用一个大型教师模型取代了联合组培训的一个分支。 RoB 广泛适用于一些建筑, 如小型 ResNet、 移动Net 和 ViT, 以及 诸如 DINO、 Swaav 或 iBOT 等预培训型模型。 当对图像网络数据集进行预先训练时, RoB 生成了与监管的知识蒸馏竞争模型。 因此, RoB 生成了强大的半监督能力。 最后, 我们最好的VT- Tiny 模型比之前的SLS- State- developlex- developal 在前的S- developliflifer listal 上改进了Slifal- dif- diflifliflifal 和 C- dirstal listal listal listal listal listalxxxxxxlpal 和 C- dirlpal lipplpal 和 C- dirstalxxxxxx