Large-scale multilingual machine translation systems have demonstrated remarkable ability to translate directly between numerous languages, making them increasingly appealing for real-world applications. However, when deployed in the wild, these models may generate hallucinated translations which have the potential to severely undermine user trust and raise safety concerns. Existing research on hallucinations has primarily focused on small bilingual models trained on high-resource languages, leaving a gap in our understanding of hallucinations in massively multilingual models across diverse translation scenarios. In this work, we fill this gap by conducting a comprehensive analysis on both the M2M family of conventional neural machine translation models and ChatGPT, a general-purpose large language model~(LLM) that can be prompted for translation. Our investigation covers a broad spectrum of conditions, spanning over 100 translation directions across various resource levels and going beyond English-centric language pairs. We provide key insights regarding the prevalence, properties, and mitigation of hallucinations, paving the way towards more responsible and reliable machine translation systems.
翻译:大规模多语言机器翻译系统已经展现出直接在多种语言之间翻译的显著能力,因此在实际应用中越来越受欢迎。然而,当这些模型在实际环境中部署时,它们可能会产生幻觉性翻译,从而严重破坏用户的信任并引发安全问题。现有关于幻觉的研究主要集中在针对高资源语言的小型双语模型上,从而缺少对于大规模多语言模型在不同翻译场景中的幻觉性翻译理解。在这项研究中,我们通过对传统神经机器翻译模型M2M系列和可用于翻译提示的通用大型语言模型ChatGPT的全面分析来填补这一空白。我们的研究涵盖了广泛的条件,跨越不同的资源水平和“非英语为中心”的语言对,超过了100种不同的翻译方向。我们提供了关于幻觉性翻译的流行性、特性和减轻幻觉性翻译的关键见解,为更加负责任和可靠的机器翻译系统铺平了道路。