We present Transformation Invariance and Covariance Contrast (TiCo) for self-supervised visual representation learning. Similar to other recent self-supervised learning methods, our method is based on maximizing the agreement among embeddings of different distorted versions of the same image, which pushes the encoder to produce transformation invariant representations. To avoid the trivial solution where the encoder generates constant vectors, we regularize the covariance matrix of the embeddings from different images by penalizing low rank solutions. By jointly minimizing the transformation invariance loss and covariance contrast loss, we get an encoder that is able to produce useful representations for downstream tasks. We analyze our method and show that it can be viewed as a variant of MoCo with an implicit memory bank of unlimited size at no extra memory cost. This makes our method perform better than alternative methods when using small batch sizes. TiCo can also be seen as a modification of Barlow Twins. By connecting the contrastive and redundancy-reduction methods together, TiCo gives us new insights into how joint embedding methods work.
翻译:我们为自我监督的视觉演示学习提出了变换常态和共性对比(Tico) 。 与其他最近自我监督的学习方法相似, 我们的方法基于将同一图像的不同扭曲版本的嵌入中的最大协议最大化, 这促使编码器产生变异性表示。 为了避免在编码器生成常量矢量时出现微小的解决方案, 我们通过惩罚低级解决方案来规范不同图像嵌入的共性矩阵。 通过联合尽量减少变换变换损失和共性对比损失, 我们得到了一个能够为下游任务生成有用演示的编码器。 我们分析方法并显示, 它可以被看作一个内嵌内存库, 且不增加内存成本。 这样我们的方法比使用小批量尺寸时的替代方法要好。 TiCo也可以被视为对巴洛双体的修改。 通过将对比式和冗余减少方法结合起来, TiCo给我们新的了解联合嵌入方法是如何运作的。