Contrastive Self-supervised Learning (CSL) is a practical solution that learns meaningful visual representations from massive data in an unsupervised approach. The ordinary CSL embeds the features extracted from neural networks onto specific topological structures. During the training progress, the contrastive loss draws the different views of the same input together while pushing the embeddings from different inputs apart. One of the drawbacks of CSL is that the loss term requires a large number of negative samples to provide better mutual information bound ideally. However, increasing the number of negative samples by larger running batch size also enhances the effects of false negatives: semantically similar samples are pushed apart from the anchor, hence downgrading downstream performance. In this paper, we tackle this problem by introducing a simple but effective contrastive learning framework. The key insight is to employ siamese-style metric loss to match intra-prototype features, while increasing the distance between inter-prototype features. We conduct extensive experiments on various benchmarks where the results demonstrate the effectiveness of our method on improving the quality of visual representations. Specifically, our unsupervised pre-trained ResNet-50 with a linear probe, out-performs the fully-supervised trained version on the ImageNet-1K dataset.
翻译:自我监督的自我监督学习(CSL)是一个实用的解决方案,它以不受监督的方式从大量数据中从大量数据中学习有意义的视觉表现。普通的 CSL 将神经网络的特征嵌入到特定的地形结构中。在培训进展期间,对比性损失将相同输入的不同观点汇集在一起,同时将不同投入的嵌入分开。 CSL 的一个缺点是,损失术语需要大量负面样本,才能提供更好的相互信息,这是理想的。然而,以较大的运行批量规模来增加负面样本的数量,也增加了虚假负面数据的效果:从固定点上推开相似的样本,从而降低了下游的性能。在本文件中,我们通过引入简单而有效的对比性学习框架来解决这个问题。关键的观点是使用硅式的标准损失来匹配内部类型特征,同时增加不同类型特征之间的距离。我们在许多基准上进行了广泛的实验,在这些基准上,结果显示了我们的方法在提高视觉表现质量方面的有效性: 语义上的类似样本被推开,从而降低了下游线性性。具体地说,我们用一个经过全面监控前的服务器- Res网- 模制成一个直线式的图像探测器。