Real-world data typically follow a long-tailed distribution, where a few majority categories occupy most of the data while most minority categories contain a limited number of samples. Classification models minimizing cross-entropy struggle to represent and classify the tail classes. Although the problem of learning unbiased classifiers has been well studied, methods for representing imbalanced data are under-explored. In this paper, we focus on representation learning for imbalanced data. Recently, supervised contrastive learning has shown promising performance on balanced data recently. However, through our theoretical analysis, we find that for long-tailed data, it fails to form a regular simplex which is an ideal geometric configuration for representation learning. To correct the optimization behavior of SCL and further improve the performance of long-tailed visual recognition, we propose a novel loss for balanced contrastive learning (BCL). Compared with SCL, we have two improvements in BCL: class-averaging, which balances the gradient contribution of negative classes; class-complement, which allows all classes to appear in every mini-batch. The proposed balanced contrastive learning (BCL) method satisfies the condition of forming a regular simplex and assists the optimization of cross-entropy. Equipped with BCL, the proposed two-branch framework can obtain a stronger feature representation and achieve competitive performance on long-tailed benchmark datasets such as CIFAR-10-LT, CIFAR-100-LT, ImageNet-LT, and iNaturalist2018. Our code is available at \href{https://github.com/FlamieZhu/BCL}{this URL}.
翻译:真实世界数据通常遵循长尾分布法, 少数多数类别{ 占大多数数据,而多数少数群体类别包含有限的样本数量。 分类模型最大限度地减少跨渗透性挣扎以代表和分类尾端类别。 尽管对学习不偏颇的分类者问题进行了很好的研究, 代表不平衡数据的方法却未得到充分探讨。 在本文中, 我们注重为不平衡数据进行代表性学习。 最近, 监督对比学习显示, 均衡数据的最新表现良好。 但是, 通过我们的理论分析, 我们发现, 对于长尾数据来说, 它未能形成一个常规简单简单简单, 这是代表学习的理想几何方配置。 为了纠正SCL的优化行为, 并进一步提高长尾视觉识别的绩效。 与 SCCL相比, 我们的BL有两种改进: 类稳定, 平衡了负面等级的梯度贡献; 等级互补, 使得所有类别都能够出现在每部的缩略图中。 拟议的平衡对比学习( BCLL) 方法, 满足了形成一个简单正常的 RAR- Rex 常规 标准,, 和 CRLLx 的交叉分析。