Despite the remarkable success deep models have achieved in Textual Matching (TM), their robustness issue is still a topic of concern. In this work, we propose a new perspective to study this issue -- via the length divergence bias of TM models. We conclude that this bias stems from two parts: the label bias of existing TM datasets and the sensitivity of TM models to superficial information. We critically examine widely used TM datasets, and find that all of them follow specific length divergence distributions by labels, providing direct cues for predictions. As for the TM models, we conduct adversarial evaluation and show that all models' performances drop on the out-of-distribution adversarial test sets we construct, which demonstrates that they are all misled by biased training sets. This is also confirmed by the \textit{SentLen} probing task that all models capture rich length information during training to facilitate their performances. Finally, to alleviate the length divergence bias in TM models, we propose a practical adversarial training method using bias-free training data. Our experiments indicate that we successfully improve the robustness and generalization ability of models at the same time.
翻译:尽管在文本匹配(TM)中取得了显著的成功,但其稳健性问题仍然是一个令人关切的议题。在这项工作中,我们提出了研究这一问题的新视角 -- -- 通过TM模型的长度差异偏差偏差,我们得出结论,这种偏差源于两个部分:现有TM数据集的标签偏差和TM模型对表面信息的敏感性。我们严格地检查了广泛使用的TM数据集,发现所有这些数据集都按照标签的具体长度差异分布,为预测提供了直接提示。关于TM模型,我们进行了对抗性评估,并展示了所有模型在超出分配的对称测试中的性能下降,这证明它们都被偏差的培训组合所误导。这也得到以下两个方面证实:所有模型在培训期间收集了大量长的信息,以促进其性能。最后,为了减轻TM模型中的时间差异,我们建议了一种实用的对抗性培训方法,使用不偏差的培训数据。我们的实验表明,我们成功地提高了模型在同一时期的稳健性和普遍化能力。