The standard empirical risk minimization (ERM) can underperform on certain minority groups (i.e., waterbirds in lands or landbirds in water) due to the spurious correlation between the input and its label. Several studies have improved the worst-group accuracy by focusing on the high-loss samples. The hypothesis behind this is that such high-loss samples are \textit{spurious-cue-free} (SCF) samples. However, these approaches can be problematic since the high-loss samples may also be samples with noisy labels in the real-world scenarios. To resolve this issue, we utilize the predictive uncertainty of a model to improve the worst-group accuracy under noisy labels. To motivate this, we theoretically show that the high-uncertainty samples are the SCF samples in the binary classification problem. This theoretical result implies that the predictive uncertainty is an adequate indicator to identify SCF samples in a noisy label setting. Motivated from this, we propose a novel ENtropy based Debiasing (END) framework that prevents models from learning the spurious cues while being robust to the noisy labels. In the END framework, we first train the \textit{identification model} to obtain the SCF samples from a training set using its predictive uncertainty. Then, another model is trained on the dataset augmented with an oversampled SCF set. The experimental results show that our END framework outperforms other strong baselines on several real-world benchmarks that consider both the noisy labels and the spurious-cues.
翻译:标准的经验风险最小化(ERM)可能对某些少数群体群体(即土地或水中陆地水鸟或陆鸟的水鸟)不利,因为输入和标签之间有虚假的关联性。 几项研究通过关注高损失样本提高了最差群体的准确性。 其背后的假设是, 高损失样本是clutit{ spritive- cue- free} (SCF) (SCF) 样本。 然而, 这些方法可能存在问题, 因为高损失样本也可能是具有真实世界情景中噪音标签的样本。 为解决这一问题,我们利用模型的预测不确定性不确定性来改进噪音标签下最差群体的准确性。 为了激励这一点,我们理论上表明,高不确定性样本是二元分类问题中的SCFC样本。 这一理论结果表明,预测不确定性是足够指标,可以用来在杂乱的标签设置中识别SCFC样品。 我们为此提议了一个新型的Entropy(END) 框架, 防止模型学习强的信号,同时使用振动的标尺标值, 还要显示CRibildal dalisal 。 在S- rocialim 上, 我们的模型上获取了另一个的S- slation 。