贪婪不偏见学习将军 (General Greedy De-bias Learning)

from arxiv, This work has been submitted to IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible

Neural networks often make predictions relying on the spurious correlations from the datasets rather than the intrinsic properties of the task of interest, facing sharp degradation on out-of-distribution (OOD) test data. Existing de-bias learning frameworks try to capture specific dataset bias by bias annotations, they fail to handle complicated OOD scenarios. Others implicitly identify the dataset bias by the special design on the low capability biased model or the loss, but they degrade when the training and testing data are from the same distribution. In this paper, we propose a General Greedy De-bias learning framework (GGD), which greedily trains the biased models and the base model like gradient descent in functional space. It encourages the base model to focus on examples that are hard to solve with biased models, thus remaining robust against spurious correlations in the test stage. GGD largely improves models' OOD generalization ability on various tasks, but sometimes over-estimates the bias level and degrades on the in-distribution test. We further re-analyze the ensemble process of GGD and introduce the Curriculum Regularization into GGD inspired by curriculum learning, which achieves a good trade-off between in-distribution and out-of-distribution performance. Extensive experiments on image classification, adversarial question answering, and visual question answering demonstrate the effectiveness of our method. GGD can learn a more robust base model under the settings of both task-specific biased models with prior knowledge and self-ensemble biased model without prior knowledge.

翻译：神经网络往往依靠数据集的虚假相关性而不是利益任务的内在特性作出预测,而不能依靠利益任务的虚假相关性,在分配外(OOOD)测试数据上面临急剧退化。现有的不偏向学习框架试图通过偏向说明捕捉具体的数据集偏差,而没有处理复杂的OOOD假想。其他网络隐含地指出关于低能力偏差模式或损失的特殊设计所存在的数据集偏差,但当培训和测试数据来自同样的分布时,它们就会退化。在本文件中,我们提议建立一个GGGDGD将军学习框架(GGGD),它贪婪地培训偏向模式和在功能空间中梯度下降等基础模型。它鼓励基础模型侧重于那些难以通过偏向模型解决具体数据集的事例,从而在测试阶段保持与虚假的对应关系。GGGGDD在很大程度上提高了模型在各种任务上对OD的概括能力,但有时高估了在分配测试中的偏差程度和贬低模式。我们进一步分析了GGDD的组合过程,并将课程的定偏向性模型引入GGDGD的自我下降模型。它鼓励了与偏向性模型,在前的学习之前的学习基础下,在前的学习中,在前的学习中,在前的学习中,在学习后,在学习对GGGDMDMDMDM的学习后,在前的学习后,在学习后,在学习后,在前的学习了一种良好的学习了一种良好的学习方法的学习。