Neural sequence generation models are known to "hallucinate", by producing outputs that are unrelated to the source text. These hallucinations are potentially harmful, yet it remains unclear in what conditions they arise and how to mitigate their impact. In this work, we first identify internal model symptoms of hallucinations by analyzing the relative token contributions to the generation in contrastive hallucinated vs. non-hallucinated outputs generated via source perturbations. We then show that these symptoms are reliable indicators of natural hallucinations, by using them to design a lightweight hallucination detector which outperforms both model-free baselines and strong classifiers based on quality estimation or large pre-trained models on manually annotated English-Chinese and German-English translation test beds.
翻译:神经序列生成模型通过产生与源文本无关的输出结果而为“光子”所著称。 这些幻觉具有潜在危害性,但是在产生这些幻觉的条件和如何减轻其影响方面仍然不清楚。 在这项工作中,我们首先通过分析对产生幻觉的相对象征性贡献,分析对产生幻觉的相对象征性贡献,与通过源突扰产生的无卤输出相对。 然后我们表明这些症状是天然幻觉的可靠指标,利用这些症状设计轻量级幻觉检测器,该检测器在质量估计基础上比无模型基线和强分仪都好,或者根据人工附加注释的中英文和德文英文测试床进行大型预培训模型。