Self-supervised learning enables networks to learn discriminative features from massive data itself. Most state-of-the-art methods maximize the similarity between two augmentations of one image based on contrastive learning. By utilizing the consistency of two augmentations, the burden of manual annotations can be freed. Contrastive learning exploits instance-level information to learn robust features. However, the learned information is probably confined to different views of the same instance. In this paper, we attempt to leverage the similarity between two distinct images to boost representation in self-supervised learning. In contrast to instance-level information, the similarity between two distinct images may provide more useful information. Besides, we analyze the relation between similarity loss and feature-level cross-entropy loss. These two losses are essential for most deep learning methods. However, the relation between these two losses is not clear. Similarity loss helps obtain instance-level representation, while feature-level cross-entropy loss helps mine the similarity between two distinct images. We provide theoretical analyses and experiments to show that a suitable combination of these two losses can get state-of-the-art results.
翻译:自我监督的学习使网络能够从大量数据本身中学习歧视特征。 大多数最先进的方法使基于对比性学习的一张图像的两张放大图之间的相似性最大化。 通过使用两个放大图的一致性, 手语说明的负担是可以解脱的。 对比性学习利用了实例一级的信息来学习强健的特征。 但是, 学到的信息可能局限于同一实例的不同观点。 在本文中, 我们试图利用两种不同的图像之间的相似性来提升自我监督学习的体现。 与实例一级的信息不同, 两种不同的图像之间的相似性可以提供更有用的信息。 此外, 我们分析了相似性损失和地貌一级的跨热带损失之间的关系。 但是, 这两种损失之间的关系并不明确。 相似性损失有助于获得实例一级的表述, 而特征一级的跨热带损失有助于使两种不同图像之间的相似性发生。 我们提供理论分析和实验, 以表明这两种损失的适当组合可以取得最先进的结果。