In open-domain question answering, a model receives a text question as input and searches for the correct answer using a large evidence corpus. The retrieval step is especially difficult as typical evidence corpora have \textit{millions} of documents, each of which may or may not have the correct answer to the question. Very recently, dense models have replaced sparse methods as the de facto retrieval method. Rather than focusing on lexical overlap to determine similarity, dense methods build an encoding function that captures semantic similarity by learning from a small collection of question-answer or question-context pairs. In this paper, we investigate dense retrieval models in the context of open-domain question answering across different input distributions. To do this, first we introduce an entity-rich question answering dataset constructed from Wikidata facts and demonstrate dense models are unable to generalize to unseen input question distributions. Second, we perform analyses aimed at better understanding the source of the problem and propose new training techniques to improve out-of-domain performance on a wide variety of datasets. We encourage the field to further investigate the creation of a single, universal dense retrieval model that generalizes well across all input distributions.
翻译:在开放域问答中, 模型会收到一个文本问题, 作为输入, 并用大量证据来搜索正确的答案。 检索步骤特别困难, 因为典型的证据公司有文件的\ textit{ millens}, 每一个文件都可能有或没有正确的答案。 最近, 密度模型已经取代了稀少的方法, 作为事实上的检索方法。 最近, 密度模型已经取代了稀少的方法, 而不是侧重于词汇重叠, 以确定相似性, 密度方法建立编码功能, 通过从少量的问答或问答文本组合中学习来捕捉语义相似性。 在本文中, 我们从不同输入分布的开放域问题中调查密集的检索模型。 为了做到这一点, 我们首先引入一个实体丰富的问题, 回答根据维基数据事实构建的数据集, 并展示密度模型无法概括到无形的输入问题分布。 其次, 我们进行分析, 目的是更好地了解问题的根源, 并提出新的培训技术, 以改善范围广泛的数据集的外部性能。 我们鼓励字段进一步调查创建单一的、 普遍密度检索模型, 在所有输入中都加以普及。