Recent self-supervised methods for image representation learning are based on maximizing the agreement between embedding vectors from different views of the same image. A trivial solution is obtained when the encoder outputs constant vectors. This collapse problem is often avoided through implicit biases in the learning architecture, that often lack a clear justification or interpretation. In this paper, we introduce VICReg (Variance-Invariance-Covariance Regularization), a method that explicitly avoids the collapse problem with a simple regularization term on the variance of the embeddings along each dimension individually. VICReg combines the variance term with a decorrelation mechanism based on redundancy reduction and covariance regularization, and achieves results on par with the state of the art on several downstream tasks. In addition, we show that incorporating our new variance term into other methods helps stabilize the training and leads to performance improvements.
翻译:最近自我监督的图像代表学习方法基于最大限度地使来自同一图像不同观点的嵌入矢量之间达成一致。当编码器输出常量矢量时,可以找到一个无关紧要的解决办法。这种崩溃问题通常通过学习结构中的隐含偏见而避免,这种偏见往往缺乏明确的理由或解释。在本文件中,我们引入了UNICReg( variance-Inforce-Colveration Recisionalization),这种方法明确避免了崩溃问题,因为每个维度的嵌入差异都有一个简单的正规化术语。 ICORReg将差异术语与基于裁员减少和变异性调节的装饰机制结合起来,并在一些下游任务上取得与最新水平相同的成果。 此外,我们表明,将我们的新差异术语纳入其他方法有助于稳定培训,并导致绩效改进。