Supervised machine learning models and their evaluation strongly depends on the quality of the underlying dataset. When we search for a relevant piece of information it may appear anywhere in a given passage. However, we observe a bias in the position of the correct answer in the text in two popular Question Answering datasets used for passage re-ranking. The excessive favoring of earlier positions inside passages is an unwanted artefact. This leads to three common Transformer-based re-ranking models to ignore relevant parts in unseen passages. More concerningly, as the evaluation set is taken from the same biased distribution, the models overfitting to that bias overestimate their true effectiveness. In this work we analyze position bias on datasets, the contextualized representations, and their effect on retrieval results. We propose a debiasing method for retrieval datasets. Our results show that a model trained on a position-biased dataset exhibits a significant decrease in re-ranking effectiveness when evaluated on a debiased dataset. We demonstrate that by mitigating the position bias, Transformer-based re-ranking models are equally effective on a biased and debiased dataset, as well as more effective in a transfer-learning setting between two differently biased datasets.
翻译:受监督的机器学习模型及其评价在很大程度上取决于基础数据集的质量。 当我们搜索某个特定段落中任何地方都可能出现相关的信息时, 我们观察到在用于通过重排的两个流行的问答数据集中文本正确答案的位置有偏差。 过分偏爱通道内早期位置是一种不必要的人工现象。 这导致三个通用的基于变换器的重排模型忽略隐蔽通道中的相关部分。 更相关的是, 由于评价数据集来自同一有偏差的分布, 与这种偏差相配的模型可能高估了它们的真实有效性。 但是, 我们观察到, 在这项工作中, 我们分析了数据集、 环境化的表达及其对检索结果的影响的偏差。 我们提出了一种检索数据集的偏差方法。 我们的结果表明, 在定位偏差的数据集中训练的模型显示, 在对偏差的数据集进行评估时, 重排位效率会显著下降。 我们证明, 通过减轻位置偏差, 基于变换器的重新排位模型在偏差和偏向性数据转换之间同样有效。