In this paper, we propose a genuine group-level contrastive visual representation learning method whose linear evaluation performance on ImageNet surpasses the vanilla supervised learning. Two mainstream unsupervised learning schemes are the instance-level contrastive framework and clustering-based schemes. The former adopts the extremely fine-grained instance-level discrimination whose supervisory signal is not efficient due to the false negatives. Though the latter solves this, they commonly come with some restrictions affecting the performance. To integrate their advantages, we design the SMoG method. SMoG follows the framework of contrastive learning but replaces the contrastive unit from instance to group, mimicking clustering-based methods. To achieve this, we propose the momentum grouping scheme which synchronously conducts feature grouping with representation learning. In this way, SMoG solves the problem of supervisory signal hysteresis which the clustering-based method usually faces, and reduces the false negatives of instance contrastive methods. We conduct exhaustive experiments to show that SMoG works well on both CNN and Transformer backbones. Results prove that SMoG has surpassed the current SOTA unsupervised representation learning methods. Moreover, its linear evaluation results surpass the performances obtained by vanilla supervised learning and the representation can be well transferred to downstream tasks.
翻译:在本文中,我们提出一个真正的群体级对比式视觉代表学习方法,在图像网上进行的直线评价业绩超过香草类监督的学习。两个主流不受监督的学习方案是实例级对比性框架和基于集群的计划。前者采用极细微的事例级歧视,其监督信号因虚假的负面信息而无效。虽然后者解决了这一问题,但它们通常带有影响业绩的一些限制。为了整合其优势,我们设计了SMoG方法。SMoG遵循了对比性学习框架,但取代了从实例到群组的对比性单位,模仿了集群方法。为了实现这一点,我们提议了一种动力组合计划,即同步进行与代表性学习的特征组合。这样,SMOG解决了监督性信号歇斯底特的问题,因为基于集群的方法通常会面临这样的问题,并减少了实例对比性方法的虚假负面因素。我们进行了详尽的实验,以显示SMoG在CNN和变换的骨干上都运作良好。结果证明,SMoG已经超越了当前的SOTA不受监督的代表学习方式,而其下游的成绩可以超越了它。