Self-supervised learning has achieved a great success in the representation learning of visual and textual data. However, the current methods are mainly validated on the well-curated datasets, which do not exhibit the real-world long-tailed distribution. Recent attempts to consider self-supervised long-tailed learning are made by rebalancing in the loss perspective or the model perspective, resembling the paradigms in the supervised long-tailed learning. Nevertheless, without the aid of labels, these explorations have not shown the expected significant promise due to the limitation in tail sample discovery or the heuristic structure design. Different from previous works, we explore this direction from an alternative perspective, i.e., the data perspective, and propose a novel Boosted Contrastive Learning (BCL) method. Specifically, BCL leverages the memorization effect of deep neural networks to automatically drive the information discrepancy of the sample views in contrastive learning, which is more efficient to enhance the long-tailed learning in the label-unaware context. Extensive experiments on a range of benchmark datasets demonstrate the effectiveness of BCL over several state-of-the-art methods. Our code is available at https://github.com/MediaBrain-SJTU/BCL.
翻译:自我监督的学习在视觉和文字数据的代表性学习方面取得了巨大成功,然而,目前的方法主要在精密的数据集上得到验证,这些数据集并不显示真实世界的长期分布。最近试图考虑自我监督的长期学习,其方法是在损失角度或模型角度上进行再平衡,类似于监督的长期尾细学习的范式。然而,如果没有标签的帮助,这些探索没有显示出预期的重大希望,原因是尾巴样本发现或超光速结构设计的限制。与以往不同,我们从另一个角度探讨这一方向,即数据视角,并提出新的促进对比学习方法。具体地说,BCL利用深神经网络的记忆效应,在对比学习中自动驱动抽样观点的信息差异。在标签-软件背景下,提高长期细致的学习效率。在一系列基准数据集上进行的广泛实验,显示BCL/Meb-Meb-Mequal-Orals的代码在几个州/州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-