Contrastive learning has recently shown immense potential in unsupervised visual representation learning. Existing studies in this track mainly focus on intra-image invariance learning. The learning typically uses rich intra-image transformations to construct positive pairs and then maximizes agreement using a contrastive loss. The merits of inter-image invariance, conversely, remain much less explored. One major obstacle to exploit inter-image invariance is that it is unclear how to reliably construct inter-image positive pairs, and further derive effective supervision from them since no pair annotations are available. In this work, we present a comprehensive empirical study to better understand the role of inter-image invariance learning from three main constituting components: pseudo-label maintenance, sampling strategy, and decision boundary design. To facilitate the study, we introduce a unified and generic framework that supports the integration of unsupervised intra- and inter-image invariance learning. Through carefully-designed comparisons and analysis, multiple valuable observations are revealed: 1) online labels converge faster and perform better than offline labels; 2) semi-hard negative samples are more reliable and unbiased than hard negative samples; 3) a less stringent decision boundary is more favorable for inter-image invariance learning. With all the obtained recipes, our final model, namely InterCLR, shows consistent improvements over state-of-the-art intra-image invariance learning methods on multiple standard benchmarks. We hope this work will provide useful experience for devising effective unsupervised inter-image invariance learning. Code: https://github.com/open-mmlab/mmselfsup.
翻译:在不受监督的视觉代表制学习中,最近出现了巨大的对比性学习潜力。本轨道的现有研究主要侧重于模拟内差异学习。学习通常使用丰富的图像内差异转换来构建正对,然后使用对比性损失来最大限度地实现协议。想象性差异的优点仍然远没有那么深入探讨。利用图中差异性差异的一个主要障碍是,如何可靠地构建图像间正对,并在没有配对说明的情况下从中进一步获得有效监督。在这项工作中,我们提出全面的经验性研究,以更好地了解从三个主要组成部分(伪标签维护、抽样战略和决定边界设计)中学习图像内差异性变化的作用。为便利研究,我们引入一个统一和通用的框架,支持不受监督的内和内动性差异学习的整合。通过仔细的比较和分析,发现多种有价值的观察:(1) 在线标签更快地融合,并且比离线标签更好。(2) 半硬性负面的模拟是更可靠和不偏向性的工作间差异性学习。在硬性样本中,一个不严格的决定性框架间学习方式是:在硬性边界间学习中,一个不那么严格的选择。