Reading comprehension QA tasks have seen a recent surge in popularity, yet most works have focused on fact-finding extractive QA. We instead focus on a more challenging multi-hop generative task (NarrativeQA), which requires the model to reason, gather, and synthesize disjoint pieces of information within the context to generate an answer. This type of multi-step reasoning also often requires understanding implicit relations, which humans resolve via external, background commonsense knowledge. We first present a strong generative baseline that uses a multi-attention mechanism to perform multiple hops of reasoning and a pointer-generator decoder to synthesize the answer. This model performs substantially better than previous generative models, and is competitive with current state-of-the-art span prediction models. We next introduce a novel system for selecting grounded multi-hop relational commonsense information from ConceptNet via a pointwise mutual information and term-frequency based scoring function. Finally, we effectively use this extracted commonsense information to fill in gaps of reasoning between context hops, using a selectively-gated attention mechanism. This boosts the model's performance significantly (also verified via human evaluation), establishing a new state-of-the-art for the task. We also show that our background knowledge enhancements are generalizable and improve performance on QAngaroo-WikiHop, another multi-hop reasoning dataset.
翻译:读懂QA的任务最近出现了受欢迎程度的上升,但大多数工作都集中在事实调查的采掘QA上。我们相反侧重于更具挑战性的多希望基因化任务(NarsituQA),这要求模型在背景中解释、收集和合成脱节部分信息,以产生答案。这种多步骤推理常常要求理解隐含关系,而人类通过外部、背景常识知识决心解决。我们首先提出了一个强大的基因化基线,利用多关注机制进行多重推理和点解器整合答案。这个模型比以前的基因化模型运行得更好,并且具有竞争力,与当前最先进的全局预测模型竞争。我们接下来推出一个新的系统,通过一个点智慧的相互信息和基于术语的评分功能从概念网中选择基础的多希望共通信息。最后,我们有效地利用这一提取的共知信息填补了背景推理之间的空白,使用选择性的注意机制。这极大地促进了模型的性能(也通过人类业绩评估来验证了我们的背景推理)。