Existing techniques for mitigating dataset bias often leverage a biased model to identify biased instances. The role of these biased instances is then reduced during the training of the main model to enhance its robustness to out-of-distribution data. A common core assumption of these techniques is that the main model handles biased instances similarly to the biased model, in that it will resort to biases whenever available. In this paper, we show that this assumption does not hold in general. We carry out a critical investigation on two well-known datasets in the domain, MNLI and FEVER, along with two biased instance detection methods, partial-input and limited-capacity models. Our experiments show that in around a third to a half of instances, the biased model is unable to predict the main model's behavior, highlighted by the significantly different parts of the input on which they base their decisions. Based on a manual validation, we also show that this estimate is highly in line with human interpretation. Our findings suggest that down-weighting of instances detected by bias detection methods, which is a widely-practiced procedure, is an unnecessary waste of training data. We release our code to facilitate reproducibility and future research.
翻译:现有减轻数据集偏差的技术往往利用偏差模型来确定偏差实例。这些偏差案例的作用在培训主要模型以加强其对分配外数据的稳健性的过程中被削弱。这些技术的一个共同核心假设是,主要模型处理的偏差事件与偏差模式相似,即只要有偏见,就会采取偏差做法。在本文中,我们表明这一假设并不普遍。我们对该领域的两个众所周知的数据集,即MNLI和FEWER,以及两个偏差案例检测方法,即部分输入和有限容量模型,进行了严格调查。我们的实验表明,在大约三分之一到一半的情况下,偏差模式无法预测主要模型的行为,而它们的决定所依据的投入大不相同。在人工验证的基础上,我们还表明这一估计与人类的解释高度一致。我们的研究结果表明,通过偏差检测方法检测发现的情况的降级是不必要的浪费,这是一种广泛操作的程序。我们发布我们的代码是为了便利重新预测和将来的研究。