Self-supervised methods based on contrastive learning have achieved great success in unsupervised visual representation learning. However, most methods under this framework suffer from the problem of false negative samples. Inspired by the mean shift for self-supervised learning, we propose a new simple framework, namely Multiple Sample Views and Queues (MSVQ). We jointly construct three soft labels on-the-fly by utilizing two complementary and symmetric approaches: multiple augmented positive views and two momentum encoders that generate various semantic features for negative samples. Two teacher networks perform similarity relationship calculations with negative samples and then transfer this knowledge to the student network. Let the student network mimic the similarity relationships between the samples, thus giving the student network a more flexible ability to identify false negative samples in the dataset. The classification results on four benchmark image datasets demonstrate the high effectiveness and efficiency of our approach compared to some classical methods. Source code and pretrained models are available \href{https://github.com/pc-cp/MSVQ}{here}.
翻译:暂无翻译