Although the problem of hallucinations in neural machine translation (NMT) has received some attention, research on this highly pathological phenomenon lacks solid ground. Previous work has been limited in several ways: it often resorts to artificial settings where the problem is amplified, it disregards some (common) types of hallucinations, and it does not validate adequacy of detection heuristics. In this paper, we set foundations for the study of NMT hallucinations. First, we work in a natural setting, i.e., in-domain data without artificial noise neither in training nor in inference. Next, we annotate a dataset of over 3.4k sentences indicating different kinds of critical errors and hallucinations. Then, we turn to detection methods and both revisit methods used previously and propose using glass-box uncertainty-based detectors. Overall, we show that for preventive settings, (i) previously used methods are largely inadequate, (ii) sequence log-probability works best and performs on par with reference-based methods. Finally, we propose DeHallucinator, a simple method for alleviating hallucinations at test time that significantly reduces the hallucinatory rate. To ease future research, we release our annotated dataset for WMT18 German-English data, along with the model, training data, and code.
翻译:虽然神经机翻译中的幻觉问题引起了一定的注意,但对这一高度病理现象的研究却缺乏坚实的基础。以前的工作在几个方面是有限的:它经常诉诸人为环境,而这种环境会放大问题,它忽视了某些(常见的)幻觉类型,而且不能证实检测超动症是否充分。在本文件中,我们为NMT幻觉的研究奠定了基础。首先,我们在自然环境中工作,即在没有人工噪音的训练或推理中,没有内部数据。最后,我们提出了DeHallucisator,这是在测试时减少幻觉的简单方法,表明有各种不同的严重错误和幻觉。然后,我们转向探测方法,同时重新审视以前使用的方法,并提议使用玻璃箱基于不确定性的探测器。总体而言,我们表明,对于预防环境,(一)以前使用的方法基本上不够充分,(二)序列记录概率工作最好,并且以参照的方法进行同样的操作。最后,我们建议DeHallcienceator,这是在测试时减少幻觉的简单方法,显示有不同的严重错误和幻觉。然后,我们转向探测方法,我们用一种方法来大大地降低我们所使用的方法,用经过了迷惑式的数据标准,用。