Readers of academic research papers often read with the goal of answering specific questions. Question Answering systems that can answer those questions can make consumption of the content much more efficient. However, building such tools requires data that reflect the difficulty of the task arising from complex reasoning about claims made in multiple parts of a paper. In contrast, existing information-seeking question answering datasets usually contain questions about generic factoid-type information. We therefore present QASPER, a dataset of 5,049 questions over 1,585 Natural Language Processing papers. Each question is written by an NLP practitioner who read only the title and abstract of the corresponding paper, and the question seeks information present in the full text. The questions are then answered by a separate set of NLP practitioners who also provide supporting evidence to answers. We find that existing models that do well on other QA tasks do not perform well on answering these questions, underperforming humans by at least 27 F1 points when answering them from entire papers, motivating further research in document-grounded, information-seeking QA, which our dataset is designed to facilitate.
翻译:学术研究论文的读者往往以回答具体问题为目的阅读这些论文。能够回答这些问题的问答系统可以提高内容的消耗效率。然而,建立这类工具需要数据,以反映对文件多个部分中索赔要求的复杂推理所产生的任务难度。相比之下,现有的信息查询问题解答数据集通常包含关于一般事实类信息的问题。因此,我们提供QASPER, 数据集为5 049个问题,超过1 585份自然语言处理文件。每个问题都由只读相应文件的标题和摘要的NLP实践者撰写,问题寻求全文中的信息。然后由另外一组NLP实践者解答问题,他们也为答案提供辅助性证据。我们发现,在解答这些问题方面,现有在其他质量保证任务上表现良好的模型效果不佳,在回答整个文件时至少达27个F1点,鼓励在文件基础、信息搜索的QA中进行进一步的研究,而我们的数据集是要为这些问题提供便利。