Real-world data usually present long-tailed distributions. Training on imbalanced data tends to render neural networks perform well on head classes while much worse on tail classes. The severe sparseness of training instances for the tail classes is the main challenge, which results in biased distribution estimation during training. Plenty of efforts have been devoted to ameliorating the challenge, including data re-sampling and synthesizing new training instances for tail classes. However, no prior research has exploited the transferable knowledge from head classes to tail classes for calibrating the distribution of tail classes. In this paper, we suppose that tail classes can be enriched by similar head classes and propose a novel distribution calibration approach named as label-Aware Distribution Calibration LADC. LADC transfers the statistics from relevant head classes to infer the distribution of tail classes. Sampling from calibrated distribution further facilitates re-balancing the classifier. Experiments on both image and text long-tailed datasets demonstrate that LADC significantly outperforms existing methods.The visualization also shows that LADC provides a more accurate distribution estimation.
翻译:有关不平衡数据的培训往往使神经网络在头类中表现良好,而在尾类中表现更差。尾类中培训情况严重稀少是主要的挑战,导致培训期间分布偏差。已作出大量努力来改善挑战,包括数据再抽样和合成尾类新培训情况。然而,以往没有研究利用从头类到尾类的可转让知识来校准尾类的分布。在本文中,我们设想尾类可以由类似的头类丰富,并提出称为标签-软件分发校准LADC的新的分发校准方法。LADC将有关头类中的统计数据转换为分析尾类的分配情况。校准分布的校准进一步便利了分类工作的再平衡。在图像和长尾类中进行实验表明LADC大大优于现有方法。视觉还显示LADC提供了更准确的分发估计。