Most recent self-supervised learning (SSL) algorithms learn features by contrasting between instances of images or by clustering the images and then contrasting between the image clusters. We introduce a simple mean-shift algorithm that learns representations by grouping images together without contrasting between them or adopting much of prior on the structure of the clusters. We simply "shift" the embedding of each image to be close to the "mean" of its neighbors. Since in our setting, the closest neighbor is always another augmentation of the same image, our model will be identical to BYOL when using only one nearest neighbor instead of 5 as used in our experiments. Our model achieves 72.4% on ImageNet linear evaluation with ResNet50 at 200 epochs outperforming BYOL. Our code is available here: https://github.com/UMBCvision/MSF
翻译:最近自我监督的学习算法(SSL)通过对比图像实例或组合图像来学习特征,然后对图像群进行对比。我们引入了一个简单的中性易变算法,通过将图像组合在一起来学习表现形式,而不必对图像群集结构作对比,或者在群集结构上采用许多先前的做法。我们简单地“改变”每个图像的嵌入,以接近其邻居的“方式”。由于在我们的设置中,最近的邻居总是同一图像的另一种增强,我们的模型与BYOL完全相同,只使用一个最近的邻居而不是我们实验中使用的5个。我们的模型在图像网线性评价上实现了72.4%的图像网线性评价,在200个波段的ResNet50上,超过BYOL。我们的代码在这里可以查到:https://github.com/UMBCvision/MSF。