Recently, long-tailed image classification harvests lots of research attention, since the data distribution is long-tailed in many real-world situations. Piles of algorithms are devised to address the data imbalance problem by biasing the training process towards less frequent classes. However, they usually evaluate the performance on a balanced testing set or multiple independent testing sets having distinct distributions with the training data. Considering the testing data may have arbitrary distributions, existing evaluation strategies are unable to reflect the actual classification performance objectively. We set up novel evaluation benchmarks based on a series of testing sets with evolving distributions. A corpus of metrics are designed for measuring the accuracy, robustness, and bounds of algorithms for learning with long-tailed distribution. Based on our benchmarks, we re-evaluate the performance of existing methods on CIFAR10 and CIFAR100 datasets, which is valuable for guiding the selection of data rebalancing techniques. We also revisit existing methods and categorize them into four types including data balancing, feature balancing, loss balancing, and prediction balancing, according the focused procedure during the training pipeline.
翻译:最近,由于数据分布在许多现实世界局势中是长篇大论,长期的图像分类引起了大量的研究关注,因为数据分布在许多现实世界局势中是长篇大论的。算法是为解决数据不平衡问题而设计的,把培训过程偏向于较不常见的班级。不过,它们通常评价均衡的测试组或与培训数据有不同分布的多个独立测试组的性能。考虑到测试数据可能任意分布,现有的评价战略无法客观地反映实际分类的性能。我们根据一系列测试组和不断演变的分布,制定了新的评价基准。设计了一系列衡量标准,用以衡量长期零售的学习算法的准确性、稳健性和范围。根据我们的基准,我们重新评价关于CIFAR10和CIFAR100数据集的现有方法的性能,这对指导数据再平衡技术的选择很有价值。我们还根据培训过程中的重点程序,重新审视现有方法,将其分为四类,包括数据平衡、特征平衡、损失平衡和预测平衡。