Machine reading comprehension (MRC) requires reasoning about both the knowledge involved in a document and knowledge about the world. However, existing datasets are typically dominated by questions that can be well solved by context matching, which fail to test this capability. To encourage the progress on knowledge-based reasoning in MRC, we present knowledge-based MRC in this paper, and build a new dataset consisting of 40,047 question-answer pairs. The annotation of this dataset is designed so that successfully answering the questions requires understanding and the knowledge involved in a document. We implement a framework consisting of both a question answering model and a question generation model, both of which take the knowledge extracted from the document as well as relevant facts from an external knowledge base such as Freebase/ProBase/Reverb/NELL. Results show that incorporating side information from external KB improves the accuracy of the baseline question answer system. We compare it with a standard MRC model BiDAF, and also provide the difficulty of the dataset and lay out remaining challenges.
翻译:机器阅读理解(MRC)要求对文件和世界知识所涉及的知识进行推理,然而,现有数据集通常以背景匹配可以很好地解决的问题为主,这些问题无法测试这种能力。为了鼓励MRC在基于知识的推理方面取得进展,我们在本文中介绍基于知识的MRC, 并建立一个由40,047对问答配对组成的新数据集。这个数据集的设计要能够成功地回答问题,就需要理解和了解文件所涉及的知识。我们实施了一个由回答问题模型和问题生成模型组成的框架,其中既包括回答问题的模型,也包括从文件中提取的知识,也包括从Freebase/ProBase/Reverb/NELL等外部知识库中获取的相关事实。结果显示,从外部KB获得的侧边端信息可以提高基线问答系统的准确性。我们将其与标准的MRC BIDAF模型进行比较,并提供了数据集的难度和其余挑战。