Multi-hop Question Answering (QA) is a challenging task since it requires an accurate aggregation of information from multiple context paragraphs and a thorough understanding of the underlying reasoning chains. Recent work in multi-hop QA has shown that performance can be boosted by first decomposing the questions into simpler, single-hop questions. In this paper, we explore one additional utility of the multi-hop decomposition from the perspective of explainable NLP: to create explanation by probing a neural QA model with them. We hypothesize that in doing so, users will be better able to construct a mental model of when the underlying QA system will give the correct answer. Through human participant studies, we verify that exposing the decomposition probes and answers to the probes to users can increase their ability to predict system performance on a question instance basis. We show that decomposition is an effective form of probing QA systems as well as a promising approach to explanation generation. In-depth analyses show the need for improvements in decomposition systems.
翻译:多跳问题解答(QA)是一项具有挑战性的任务,因为它需要从多个上下文段落中准确汇总信息,并透彻理解基本推理链。多跳QA的近期工作表明,通过首先将问题分解为简单、单跳问题,可以提高性能。在本文中,我们从可解释的NLP的角度探讨多跳分解的另一个有用性:通过与它们探索神经定量解析模型来创造解释。我们假设,在这样做时,用户将能更好地构建一个心理模型,说明基础QA系统何时会给出正确的答案。通过人类参与者的研究,我们核实向用户披露分解的探测器和探测器的答案能够提高用户在问题实例基础上预测系统性能的能力。我们表明,分解是一种有效的方式,可以对QA系统进行解释,并且是一种很有希望的生成方法。深入分析表明,分解系统需要改进。