Neural sequence models can generate highly fluent sentences, but recent studies have also shown that they are also prone to hallucinate additional content not supported by the input. These variety of fluent but wrong outputs are particularly problematic, as it will not be possible for users to tell they are being presented incorrect content. To detect these errors, we propose a task to predict whether each token in the output sequence is hallucinated (not contained in the input) and collect new manually annotated evaluation sets for this task. We also introduce a method for learning to detect hallucinations using pretrained language models fine tuned on synthetic data that includes automatically inserted hallucinations Experiments on machine translation (MT) and abstractive summarization demonstrate that our proposed approach consistently outperforms strong baselines on all benchmark datasets. We further demonstrate how to use the token-level hallucination labels to define a fine-grained loss over the target sequence in low-resource MT and achieve significant improvements over strong baseline methods. We also apply our method to word-level quality estimation for MT and show its effectiveness in both supervised and unsupervised settings. Codes and data available at https://github.com/violet-zct/fairseq-detect-hallucination.
翻译:神经序列模型可以产生高度流利的句子,但最近的研究也表明,它们也容易产生幻觉,产生没有投入支持的更多内容。这些流利但错误的产出特别成问题,因为用户无法说出它们被呈现出不正确的内容。为检测这些错误,我们提议了一项任务,以预测产出序列中的每个符号是否都是全息(未包含在输入中),并为这项任务收集新的人工手动附加说明的评价套件。我们还引入了一种学习方法,以便利用预先训练的语言模型来检测错觉,该模型在合成数据上进行调整,包括自动插入机器翻译的幻觉实验(MT)和抽象拼凑,这些合成数据表明我们所提议的方法始终超越所有基准数据集的强基线。我们进一步展示了如何使用象征性水平的幻觉标签来界定低资源MT的目标序列的微损失,并在强的基线方法上实现重大改进。我们还采用我们的方法,对MT进行字级质量估计,并显示其在监管和未监督的环境中的有效性。