This paper studies the bias problem of multi-hop question answering models, of answering correctly without correct reasoning. One way to robustify these models is by supervising to not only answer right, but also with right reasoning chains. An existing direction is to annotate reasoning chains to train models, requiring expensive additional annotations. In contrast, we propose a new approach to learn evidentiality, deciding whether the answer prediction is supported by correct evidences, without such annotations. Instead, we compare counterfactual changes in answer confidence with and without evidence sentences, to generate "pseudo-evidentiality" annotations. We validate our proposed model on an original set and challenge set in HotpotQA, showing that our method is accurate and robust in multi-hop reasoning.
翻译:本文研究多跳问题回答模型的偏向问题,即正确回答而不正确推理。强化这些模型的方法之一是不仅监督正确回答,而且还监督正确的推理链。现有的方向是说明用于培训模型的推理链,需要花费昂贵的额外说明。相反,我们提出了一种新的方法来学习证据性,决定答案预测是否得到正确证据的支持,而没有这样的说明。相反,我们比较了回答信心的反事实变化和没有证据的句子,以产生“假想的明显性”说明。我们验证了我们在HotpotQA中最初设定的一套模型和挑战,表明我们的方法在多跳点推理中是准确和有力的。