Question answering models can use rich knowledge sources -- up to one hundred retrieved passages and parametric knowledge in the large-scale language model (LM). Prior work assumes information in such knowledge sources is consistent with each other, paying little attention to how models blend information stored in their LM parameters with that from retrieved evidence documents. In this paper, we simulate knowledge conflicts (i.e., where parametric knowledge suggests one answer and different passages suggest different answers) and examine model behaviors. We find retrieval performance heavily impacts which sources models rely on, and current models mostly rely on non-parametric knowledge in their best-performing settings. We discover a troubling trend that contradictions among knowledge sources affect model confidence only marginally. To address this issue, we present a new calibration study, where models are discouraged from presenting any single answer when presented with multiple conflicting answer candidates in retrieved evidences.
翻译:问题解答模型可以使用丰富的知识来源 -- -- 在大型语言模型(LM)中,多达100个检索到的段落和参数知识。先前的工作假设这些知识来源的信息彼此一致,很少注意模型如何将LM参数中储存的信息与检索到的证据文件中的信息混为一谈。在本文中,我们模拟知识冲突(例如,参数知识显示一个答案,不同的段落表示不同的答案)并检查模型行为。我们发现来源模型所依赖的检索性能影响很大,而目前的模型则主要依赖其最佳环境中的非参数知识。我们发现一种令人不安的趋势,即知识来源之间的矛盾只会对模型的信心产生很小的影响。为了解决这一问题,我们提出了一项新的校准研究,即当在检索的证据中与多个相互冲突的回答对象提出单一答案时,模型不会提出单一的答案。