Long-tailed datasets are very frequently encountered in real-world use cases where few classes or categories (known as majority or head classes) have higher number of data samples compared to the other classes (known as minority or tail classes). Training deep neural networks on such datasets gives results biased towards the head classes. So far, researchers have come up with multiple weighted loss and data re-sampling techniques in efforts to reduce the bias. However, most of such techniques assume that the tail classes are always the most difficult classes to learn and therefore need more weightage or attention. Here, we argue that the assumption might not always hold true. Therefore, we propose a novel approach to dynamically measure the instantaneous difficulty of each class during the training phase of the model. Further, we use the difficulty measures of each class to design a novel weighted loss technique called `class-wise difficulty based weighted (CDB-W) loss' and a novel data sampling technique called `class-wise difficulty based sampling (CDB-S)'. To verify the wide-scale usability of our CDB methods, we conducted extensive experiments on multiple tasks such as image classification, object detection, instance segmentation and video-action classification. Results verified that CDB-W loss and CDB-S could achieve state-of-the-art results on many class-imbalanced datasets such as ImageNet-LT, LVIS and EGTEA, that resemble real-world use cases.
翻译:在现实世界使用的案例中,很少的类别或类别(称为多数类或头类)与其他类别(称为少数类或尾类)相比数据样本数量较多的情况下,往往会遇到长尾数据集。在这类数据集上培训深神经网络的结果偏向于头类。到目前为止,研究人员在减少偏差的努力中提出了多重加权损失和数据再抽样技术;然而,大多数这类技术都假定尾类总是最难学习的类别,因此需要更多的权重或关注。在这里,我们争辩说,这一假设可能并不总是真实的。因此,我们提出一种新的方法,动态地测量模型培训阶段每一类的瞬间困难。此外,我们利用每个类的难度措施设计一种新的加权损失技术,称为“基于加权损失的分类困难”和数据再抽样技术,称为“基于等级的难度抽样(CDB-S)”。为了核实我们CDB方法的广泛可用性,我们对该类方法进行了广泛的实验,例如图像分类、天体检测和光谱的CD-B级图像-S分类,从而实现CD-L-C-C-C-C-C-C-C-C-C-C-C-C-C-C-CLVDL-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-LV-C-L-L-L-LD-LD-LD-C-C-LD-L-L-L-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-LVD-C-C-C-C-LD-C-LD-L-L-L-L-L-L-L-L-L-L-L-L-L-L-L-L-L-L-L-L-L-L-L-L-L-L-L-L-L-L-L-L-L-L-L-L-L-L-L-L-L-L-L-L-L-L-L-L-L-L-L-L-L-L-L-L-L-L-L-L-L-L-