Unsupervised neural machine translation (UNMT) has recently attracted great interest in the machine translation community. The main advantage of the UNMT lies in its easy collection of required large training text sentences while with only a slightly worse performance than supervised neural machine translation which requires expensive annotated translation pairs on some translation tasks. In most studies, the UMNT is trained with clean data without considering its robustness to the noisy data. However, in real-world scenarios, there usually exists noise in the collected input sentences which degrades the performance of the translation system since the UNMT is sensitive to the small perturbations of the input sentences. In this paper, we first time explicitly take the noisy data into consideration to improve the robustness of the UNMT based systems. First of all, we clearly defined two types of noises in training sentences, i.e., word noise and word order noise, and empirically investigate its effect in the UNMT, then we propose adversarial training methods with denoising process in the UNMT. Experimental results on several language pairs show that our proposed methods substantially improved the robustness of the conventional UNMT systems in noisy scenarios.
翻译:最近,无人监督的神经机器翻译(UNMT)引起了对机器翻译界的极大兴趣,联合国MT的主要优势在于它易于收集所需的大型培训文本,而其性能却比监督的神经机器翻译略微差一点,因为监督的神经机器翻译需要昂贵的附加注释的翻译对一些翻译任务。在大多数研究中,UMNT在接受清洁数据培训时没有考虑到对噪音数据的坚固度。然而,在现实世界的情景中,所收集的输入句中通常会出现噪音,降低翻译系统的性能,因为联合国MT对输入句的微小扰动十分敏感。在本文件中,我们首次明确将噪音数据纳入考虑,以提高基于UNMTT系统的稳健性。首先,我们明确界定了培训句中的两类噪音,即字声和单调噪音,并对它在联合国MT中的影响进行经验性调查。然后,我们提出了带有联合国MT的去音过程的对抗性培训方法。对几个语文配的实验结果表明,我们提出的方法大大改进了联合国MTT系统在紧张情景中的坚固性。