Recent unsupervised contrastive representation learning follows a Single Instance Multi-view (SIM) paradigm where positive pairs are usually constructed with intra-image data augmentation. In this paper, we propose an effective approach called Beyond Single Instance Multi-view (BSIM). Specifically, we impose more accurate instance discrimination capability by measuring the joint similarity between two randomly sampled instances and their mixture, namely spurious-positive pairs. We believe that learning joint similarity helps to improve the performance when encoded features are distributed more evenly in the latent space. We apply it as an orthogonal improvement for unsupervised contrastive representation learning, including current outstanding methods SimCLR, MoCo, and BYOL. We evaluate our learned representations on many downstream benchmarks like linear classification on ImageNet-1k and PASCAL VOC 2007, object detection on MS COCO 2017 and VOC, etc. We obtain substantial gains with a large margin almost on all these tasks compared with prior arts.
 翻译:最近未经监督的对比代表性学习遵循了单一实例多视角(SIM)模式,其中正对通常是用图像内数据增强来构建的。在本文中,我们提出了一个名为“超越单一实例多视角(BSIM)”的有效方法。具体地说,我们通过测量两个随机抽样案例及其混合(即假阳性对夫妇)之间的共同相似性,赋予了更准确的实例歧视能力。我们认为,当编码特征在潜藏空间中分布得更均匀时,学习联合相似性有助于改进性能。我们将其用作非监督的对比性代表性学习的正反改进,包括目前SimCLR、Moco和BYOL的杰出方法。我们评估了我们在许多下游基准上所学到的演示,例如在图像Net-1k和PACAL VOC 2007 的线性分类、对MS CO 2017 和 VOC 的物体探测。我们在所有这些任务上取得了巨大的收益,与以前的艺术相比,我们在所有这些任务上几乎有很大的利润。