解决自我监督的代表制学习效率低下问题 (Towards Solving Inefficiency of Self-supervised Representation Learning)

Self-supervised learning (especially contrastive learning) has attracted great interest due to its tremendous potentials in learning discriminative representations in an unsupervised manner. Despite the acknowledged successes, existing contrastive learning methods suffer from very low learning efficiency, e.g., taking about ten times more training epochs than supervised learning for comparable recognition accuracy. In this paper, we discover two contradictory phenomena in contrastive learning that we call under-clustering and over-clustering problems, which are major obstacles to learning efficiency. Under-clustering means that the model cannot efficiently learn to discover the dissimilarity between inter-class samples when the negative sample pairs for contrastive learning are insufficient to differentiate all the actual object categories. Over-clustering implies that the model cannot efficiently learn the feature representation from excessive negative sample pairs, which enforces the model to over-cluster samples of the same actual categories into different clusters. To simultaneously overcome these two problems, we propose a novel self-supervised learning framework using a median triplet loss. Precisely, we employ a triplet loss tending to maximize the relative distance between the positive pair and negative pairs to address the under-clustering problem; and we construct the negative pair by selecting the negative sample of a median similarity score from all negative samples to avoid the over-clustering problem, guaranteed by the Bernoulli Distribution model. We extensively evaluate our proposed framework in several large-scale benchmarks (e.g., ImageNet, SYSU-30k, and COCO). The results demonstrate the superior performance (e.g., the learning efficiency) of our model over the latest state-of-the-art methods by a clear margin. Codes available at: https://github.com/wanggrun/triplet.

翻译：自我监督的学习(尤其是对比式学习)引起了极大的兴趣,因为它在以不受监督的方式学习歧视性表现方面有着巨大的潜力。尽管取得了公认的成功,但现有的对比式学习方法却受到非常低的学习效率的影响,例如,为了可比的承认准确性,比监督的学习要多大约十倍于受监督的训练时代。在本文件中,我们发现两个相互矛盾的现象,即我们称之为集群不足和集群过多的问题,这是学习效率的主要障碍。分组不足意味着模型无法有效地学会发现不同类间样本之间的差异,而对比式的高级学习不足以区分所有实际对象类别。超集中式学习方法意味着模型无法有效地从过多的负式抽样中学习特征代表,而将模型应用于相同类别中的超集束样本,同时克服这两个问题,我们建议采用一个新的自我监督的模型学习框架,使用中位的三分位流损失。确切地说,我们采用三重损失的方式,尽量扩大正式对正对比和负式的高级学习方法,通过大规模的样本学习,我们从负式的样本到下层的样本中进行。