Multi-hop question answering (QA) requires a model to retrieve and integrate information from different parts of a long text to answer a question. Humans answer this kind of complex questions via a divide-and-conquer approach. In this paper, we investigate whether top-performing models for multi-hop questions understand the underlying sub-questions like humans. We adopt a neural decomposition model to generate sub-questions for a multi-hop complex question, followed by extracting the corresponding sub-answers. We show that multiple state-of-the-art multi-hop QA models fail to correctly answer a large portion of sub-questions, although their corresponding multi-hop questions are correctly answered. This indicates that these models manage to answer the multi-hop questions using some partial clues, instead of truly understanding the reasoning paths. We also propose a new model which significantly improves the performance on answering the sub-questions. Our work takes a step forward towards building a more explainable multi-hop QA system.
翻译:多跳问题解答( QA) 需要一个模型来检索和整合长文本不同部分的信息以解答一个问题。 人类通过分而解的方法回答这类复杂问题。 在本文中, 我们调查多跳问题最优秀模型是否理解人类等根本子问题。 我们采用神经分解模型来产生多跳复杂问题的子问题, 然后提取相应的子解答 。 我们显示多个最先进的多跳多跳QA 模型无法正确解答大部分子问题, 尽管他们对应的多跳问题得到了正确的解答 。 这表示这些模型能够使用部分线索回答多跳问题, 而不是真正理解推理路径 。 我们还提出了一个新的模型, 大大改进了回答子问题的业绩 。 我们的工作向前迈出了一步, 以建立一个更能解释的多跳的多跳 QA 系统 。