Information seeking users often pose questions with false presuppositions, especially when asking about unfamiliar topics. Most existing question answering (QA) datasets, in contrast, assume all questions have well defined answers. We introduce CREPE, a QA dataset containing a natural distribution of presupposition failures from online information-seeking forums. We find that 25% of questions contain false presuppositions, and provide annotations for these presuppositions and their corrections. Through extensive baseline experiments, we show that adaptations of existing open-domain QA models can find presuppositions moderately well, but struggle when predicting whether a presupposition is factually correct. This is in large part due to difficulty in retrieving relevant evidence passages from a large text corpus. CREPE provides a benchmark to study question answering in the wild, and our analyses provide avenues for future work in better modeling and further studying the task.
翻译:寻找用户的信息往往以虚假的预想提出问题,特别是在询问不熟悉的话题时。大多数现有的回答问题(QA)数据集都假定所有问题都有定义明确的答案。我们引入了包含在线信息搜索论坛预想失败的自然分布的QA数据集CREPE。我们发现25%的问题包含错误的预想,并为这些预想及其更正提供了说明。我们通过广泛的基线实验,表明对现有开放域QA模型的调整可以发现预想的很好,但在预测预想是否真实时很难找到。这在很大程度上是因为很难从大型文本堆中检索相关证据段落。CREPE为研究在野外回答的问题提供了一个基准,我们的分析为今后改进建模和进一步研究任务的工作提供了途径。