While the problem of hallucinations in neural machine translation has long been recognized, so far the progress on its alleviation is very little. Indeed, recently it turned out that without artificially encouraging models to hallucinate, previously existing methods fall short and even the standard sequence log-probability is more informative. It means that characteristics internal to the model can give much more information than we expect, and before using external models and measures, we first need to ask: how far can we go if we use nothing but the translation model itself ? We propose to use a method that evaluates the percentage of the source contribution to a generated translation. Intuitively, hallucinations are translations "detached" from the source, hence they can be identified by low source contribution. This method improves detection accuracy for the most severe hallucinations by a factor of 2 and is able to alleviate hallucinations at test time on par with the previous best approach that relies on external models. Next, if we move away from internal model characteristics and allow external tools, we show that using sentence similarity from cross-lingual embeddings further improves these results.
翻译:虽然神经机翻译中的幻觉问题早已得到承认,但到目前为止,在缓解这种幻觉方面进展甚微。事实上,最近的结果是,没有人为地鼓励幻觉模型,现有的方法就不足,甚至标准序列日志概率也比较丰富。这意味着模型的内部特征能够提供比我们预期的更多信息,在使用外部模型和措施之前,我们首先需要问:如果我们除了翻译模型本身之外什么都不使用,我们还能走多远?我们建议使用一种方法来评估源对生成翻译的贡献比例。我们建议使用一种方法来评估源对生成翻译的贡献比例。直观地说,幻觉是从源中“显示”的翻译,因此可以用低源贡献来识别。这种方法可以提高最严重幻觉的检测精度,以2倍的系数来提高检测精度,并且能够在试验时与依赖外部模型的先前最佳方法一样减轻幻觉。此外,如果我们远离内部模型特性并允许外部工具,我们建议使用一种方法来评估来源对生成翻译的贡献比例的方法。我们建议使用一种方法,即用跨语言嵌入的句相近度来进一步改进这些结果。