Self-supervised visual representation learning aims to learn useful representations without relying on human annotations. Joint embedding approach bases on maximizing the agreement between embedding vectors from different views of the same image. Various methods have been proposed to solve the collapsing problem where all embedding vectors collapse to a trivial constant solution. Among these methods, contrastive learning prevents collapse via negative sample pairs. It has been shown that non-contrastive methods suffer from a lesser collapse problem of a different nature: dimensional collapse, whereby the embedding vectors end up spanning a lower-dimensional subspace instead of the entire available embedding space. Here, we show that dimensional collapse also happens in contrastive learning. In this paper, we shed light on the dynamics at play in contrastive learning that leads to dimensional collapse. Inspired by our theory, we propose a novel contrastive learning method, called DirectCLR, which directly optimizes the representation space without relying on a trainable projector. Experiments show that DirectCLR outperforms SimCLR with a trainable linear projector on ImageNet.
翻译:自我监督的视觉演示学习旨在学习有用的表达方式, 而不必依赖人的语义。 联合嵌入法基础 最大限度地实现嵌入矢量之间从不同角度对同一图像的不同观点之间的协议。 已经提出了各种方法来解决所有嵌入矢量都崩溃到一个微不足道的常态解决方案的崩溃问题。 在这些方法中, 对比式学习可以防止通过负样配对的崩溃。 已经证明非对抗性方法会遇到一个不同性质的较小崩溃问题: 尺寸崩溃, 嵌入的矢量最终会跨越一个低维次空间, 而不是整个可用的嵌入空间。 在这里, 我们显示, 维衰落也是在对比性学习中发生的。 在本文中, 我们展示了模拟学习的动态, 从而导致天体崩溃。 我们根据我们的理论, 我们提出了一个新的新的对比性学习方法, 称为直接CLR, 直接优化代表空间, 而不依赖一个可训练的投影仪。 实验显示, 直接CLR会超越SimCLR, 在图像网络上有一个可训练的线性投射投影仪。