Recent approaches in self-supervised learning of image representations can be categorized into different families of methods and, in particular, can be divided into contrastive and non-contrastive approaches. While differences between the two families have been thoroughly discussed to motivate new approaches, we focus more on the theoretical similarities between them. By designing contrastive and covariance based non-contrastive criteria that can be related algebraically and shown to be equivalent under limited assumptions, we show how close those families can be. We further study popular methods and introduce variations of them, allowing us to relate this theoretical result to current practices and show the influence (or lack thereof) of design choices on downstream performance. Motivated by our equivalence result, we investigate the low performance of SimCLR and show how it can match VICReg's with careful hyperparameter tuning, improving significantly over known baselines. We also challenge the popular assumptions that contrastive and non-contrastive methods, respectively, need large batch sizes and output dimensions. Our theoretical and quantitative results suggest that the numerical gaps between contrastive and non-contrastive methods in certain regimes can be closed given better network design choices and hyperparameter tuning. The evidence shows that unifying different SOTA methods is an important direction to build a better understanding of self-supervised learning.
翻译:在自我监督下学习图像表现的最近方法可分为不同的方法类别,特别是可分为不同的方法类别,不同的方法类别可分为不同的方法类别,特别是可分化为对比性和非争议性的方法。虽然对两个家庭之间的差异进行了透彻的讨论,以激励采用新的方法,但我们更侧重于它们的理论相似性。我们设计了对比性和共变的非互动性标准,这些标准可以相互对立,在有限的假设下,我们展示出这些家庭如何接近。我们进一步研究了流行的方法,并引入了这些方法的差异,使我们能够将这种理论结果与当前的做法联系起来,并显示设计选择对下游绩效的影响(或缺乏这种影响)。我们受等效结果的驱动,我们调查了SimCLR的低效性能,并展示了它如何与维也纳国际中心的仔细超光度调整相匹配,大大高于已知的基线。我们还质疑了这些流行的假设,即对比性与非相近的方法分别需要大批量和产出层面。我们的理论和定量结果表明,在某些制度中,对对比性和非反向性方法之间的数字差距可以表明对下游绩效的影响(或缺乏这种影响)。我们所重视的对等的结果,我们调查了SimCLRLRLLRing 一种更好的网络选择和自我调整的方法可以表明,从而形成一种更好的学习方向。