Ideally, dialogue systems should generate responses that are faithful to the knowledge contained in relevant documents. However, many models generate hallucinated responses instead that contradict it or contain unverifiable information. To mitigate such undesirable behaviour, it has been proposed to fine-tune a `negative expert' on negative examples and subtract its parameters from those of a pre-trained model. However, intuitively, this does not take into account that some parameters are more responsible than others in causing hallucinations. Thus, we propose to weigh their individual importance via (an approximation of) the Fisher Information matrix, which measures the uncertainty of their estimate. We call this method Elastic Weight Removal (EWR). We evaluate our method -- using different variants of Flan-T5 as a backbone language model -- on multiple datasets for information-seeking dialogue generation and compare our method with state-of-the-art techniques for faithfulness, such as CTRL, Quark, DExperts, and Noisy Channel reranking. Extensive automatic and human evaluation shows that EWR systematically increases faithfulness at minor costs in terms of other metrics. However, we notice that only discouraging hallucinations may increase extractiveness, i.e. shallow copy-pasting of document spans, which can be undesirable. Hence, as a second main contribution, we show that our method can be extended to simultaneously discourage hallucinations and extractive responses. We publicly release the code for reproducing EWR and all baselines.
翻译:理想情况下,对话系统应该生成对相关文档中包含的知识忠实的回复。然而,许多模型生成幻觉式的回复,这些回复与知识相矛盾或包含不能核实的信息。为了缓解这种不良行为,已经提出了在负面示例上微调“负面专家”,并将其参数从预训练模型的参数中减去。然而,直觉上来说,这并没有考虑到某些参数在引起幻觉方面的作用比其他参数更重要。因此,我们提出通过(近似的)Fisher 信息矩阵对它们的单个重要性进行加权,该矩阵测量其估计的不确定性。我们将这种方法称为弹性权重消除(EWR)。我们使用不同的Flan-T5变体作为骨干语言模型在多个信息寻求对话生成数据集上评估我们的方法,并将我们的方法与忠实度的最新技术进行比较,例如CTRL、Quark、DExperts和Noisy Channel reranking。广泛的自动和人工评估表明,EWR系统地提高了忠诚度,代价是其他指标上的小成本。然而,我们注意到,仅仅阻止幻觉可能会增加抽取性,即文档片段的浅层复制,这可能是不可取的。因此,作为第二主要贡献,我们展示了我们的方法可以扩展到同时阻止幻觉和抽取式响应。我们公开发布了用于复制 EWR 和所有基线的代码。