Imbalanced data pose challenges for deep learning based classification models. One of the most widely-used approaches for tackling imbalanced data is re-weighting, where training samples are associated with different weights in the loss function. Most of existing re-weighting approaches treat the example weights as the learnable parameter and optimize the weights on the meta set, entailing expensive bilevel optimization. In this paper, we propose a novel re-weighting method based on optimal transport (OT) from a distributional point of view. Specifically, we view the training set as an imbalanced distribution over its samples, which is transported by OT to a balanced distribution obtained from the meta set. The weights of the training samples are the probability mass of the imbalanced distribution and learned by minimizing the OT distance between the two distributions. Compared with existing methods, our proposed one disengages the dependence of the weight learning on the concerned classifier at each iteration. Experiments on image, text and point cloud datasets demonstrate that our proposed re-weighting method has excellent performance, achieving state-of-the-art results in many cases and providing a promising tool for addressing the imbalanced classification issue.
翻译:平衡数据给深层学习基于分类模式带来了挑战。解决不平衡数据的最广泛使用的方法之一是重新加权,培训样本与损失函数中不同重量相关。大多数现有的再加权方法将示例重量作为可学习参数,优化元数据集的重量,从而产生昂贵的双级优化。在本文中,我们提议了一种基于从分布角度最佳运输(OT)的新颖的再加权方法。具体地说,我们认为培训集合是其样品的不平衡分布,由OT运输到从元数据集中获得的均衡分布。培训样本的重量是不平衡分布的概率质量,通过尽量减少两种分布之间的OT距离而学到的。与现有方法相比,我们提出的一个方法将重量学习从每个分类点上脱离对相关分类师的依赖。对图像、文字和点云数据集的实验表明,我们提议的再加权方法表现优异,在许多案例中取得了最新的结果,并且提供了解决不平衡问题的有希望的工具。