Data imbalance is a common problem in machine learning that can have a critical effect on the performance of a model. Various solutions exist but their impact on the convergence of the learning dynamics is not understood. Here, we elucidate the significant negative impact of data imbalance on learning, showing that the learning curves for minority and majority classes follow sub-optimal trajectories when training with a gradient-based optimizer. This slowdown is related to the imbalance ratio and can be traced back to a competition between the optimization of different classes. Our main contribution is the analysis of the convergence of full-batch (GD) and stochastic gradient descent (SGD), and of variants that renormalize the contribution of each per-class gradient. We find that GD is not guaranteed to decrease the loss for each class but that this problem can be addressed by performing a per-class normalization of the gradient. With SGD, class imbalance has an additional effect on the direction of the gradients: the minority class suffers from a higher directional noise, which reduces the effectiveness of the per-class gradient normalization. Our findings not only allow us to understand the potential and limitations of strategies involving the per-class gradients, but also the reason for the effectiveness of previously used solutions for class imbalance such as oversampling.
翻译:在机器学习中,数据不平衡是一个常见的问题,它可能对模型的性能产生重要影响。存在各种解决办法,但它们对学习动态趋同的影响却不为人所理解。在这里,我们阐述了数据不平衡对学习的巨大负面影响,表明少数民族和多数阶层的学习曲线在以梯度为基础的优化器进行训练时遵循亚最佳轨迹。这种减速与不平衡比率有关,可追溯到不同阶层的优化之间的竞争。我们的主要贡献是分析全面批量和随机梯度下降的趋同,以及使每个阶层梯度的贡献重新正常化的变异。我们发现,GD不能保证减少每一阶层的损失,但这一问题可以通过每阶层梯度的正常化来加以解决。随着SGD的出现,等级不平衡对梯度的方向产生了额外的影响:少数群体阶层受到更高的方向噪音的影响,这降低了每个阶层梯度正常化的效果。我们的调查结果不仅使我们能够了解GD的潜力和局限性,而且使我们了解过去采用的班级梯度原因的不平衡性战略也影响到了为每个阶层的梯度所应用的不平衡性原因。</s>