Learning with Noisy Labels (LNL) has attracted significant attention from the research community. Many recent LNL methods rely on the assumption that clean samples tend to have "small loss". However, this assumption always fails to generalize to some real-world cases with imbalanced subpopulations, i.e., training subpopulations varying in sample size or recognition difficulty. Therefore, recent LNL methods face the risk of misclassifying those "informative" samples (e.g., hard samples or samples in the tail subpopulations) into noisy samples, leading to poor generalization performance. To address the above issue, we propose a novel LNL method to simultaneously deal with noisy labels and imbalanced subpopulations. It first leverages sample correlation to estimate samples' clean probabilities for label correction and then utilizes corrected labels for Distributionally Robust Optimization (DRO) to further improve the robustness. Specifically, in contrast to previous works using classification loss as the selection criterion, we introduce a feature-based metric that takes the sample correlation into account for estimating samples' clean probabilities. Then, we refurbish the noisy labels using the estimated clean probabilities and the pseudo-labels from the model's predictions. With refurbished labels, we use DRO to train the model to be robust to subpopulation imbalance. Extensive experiments on a wide range of benchmarks demonstrate that our technique can consistently improve current state-of-the-art robust learning paradigms against noisy labels, especially when encountering imbalanced subpopulations.
翻译:与 Noisy Labels (LNL) 一起学习已经引起了研究界的极大关注。 许多最近的 LNL 方法都基于清洁样品往往具有“ 小损失”的假设。 但是,这一假设总是无法概括到某些人口不平衡的实际情况中, 即培训抽样大小或识别困难不同的亚人群。 因此, 最近的 LNL 方法面临着将这些“ 信息性” 样本( 如尾尾部亚群群中的硬样品或样本) 错误分类到杂乱的样本中的风险, 从而导致总体性表现不佳。 为了解决上述问题, 我们建议了一种新的 LNL 方法, 以同时处理噪音标签和不平衡亚群群群。 它首先利用样本相关性来估计抽样的清洁概率, 以便进行标签校正, 然后利用分布式机械化( DRO) 优化( DRO) 的校正标签来进一步提高稳健性。 具体地说, 与以前用分类损失作为选择标准的模型相比, 我们采用基于特征的衡量指标的衡量标准, 将样本相关性纳入到估算样本的比重的比重性比重的比重的准确的标签, 。 之后, 我们特别用机械的标签来进行更清洁的比重的比 。