Multi-hop question answering requires models to gather information from different parts of a text to answer a question. Most current approaches learn to address this task in an end-to-end way with neural networks, without maintaining an explicit representation of the reasoning process. We propose a method to extract a discrete reasoning chain over the text, which consists of a series of sentences leading to the answer. We then feed the extracted chains to a BERT-based QA model to do final answer prediction. Critically, we do not rely on gold annotated chains or "supporting facts:" at training time, we derive pseudogold reasoning chains using heuristics based on named entity recognition and coreference resolution. Nor do we rely on these annotations at test time, as our model learns to extract chains from raw text alone. We test our approach on two recently proposed large multi-hop question answering datasets: WikiHop and HotpotQA, and achieve state-of-art performance on WikiHop and strong performance on HotpotQA. Our analysis shows the properties of chains that are crucial for high performance: in particular, modeling extraction sequentially is important, as is dealing with each candidate sentence in a context-aware way. Furthermore, human evaluation shows that our extracted chains allow humans to give answers with high confidence, indicating that these are a strong intermediate abstraction for this task.
翻译:多跳问题解答要求模型从文本的不同部分收集信息以解答一个问题。 大多数当前方法都通过神经网络以端到端的方式学习解决这项任务,而没有保持对推理过程的明确描述。 我们建议了一种方法来从文本上提取一个离散的推理链, 由一系列的句子组成。 我们然后将提取的链子输入到基于 BERT 的 QA 模型中进行最后回答预测。 关键地说, 我们不依赖黄金注释链或“ 支持事实 : ” 在培训时, 我们用基于指定实体的识别和共同参照分辨率的超链接来生成假冒推理链。 我们的模型在测试时也不依靠这些批注, 因为我们的模型学会从原始文本中提取链条。 我们用最近提出的两个大型多跳问题解答数据集( WikiHop 和 HotpotQA ) 模型来进行最终回答。 关键地说, 我们不依赖黄金链条或“ 支持事实 : ” 。 我们的分析显示链条的特性对于高性表现非常关键: 特别是,, 模拟提取测算过程是每个选择的路径, 使得每个选项都显示人类的路径能够让候选人获得高度的答案。