Pronouns are frequently omitted in pro-drop languages, such as Chinese, generally leading to significant challenges with respect to the production of complete translations. To date, very little attention has been paid to the dropped pronoun (DP) problem within neural machine translation (NMT). In this work, we propose a novel reconstruction-based approach to alleviating DP translation problems for NMT models. Firstly, DPs within all source sentences are automatically annotated with parallel information extracted from the bilingual training corpus. Next, the annotated source sentence is reconstructed from hidden representations in the NMT model. With auxiliary training objectives, in terms of reconstruction scores, the parameters associated with the NMT model are guided to produce enhanced hidden representations that are encouraged as much as possible to embed annotated DP information. Experimental results on both Chinese-English and Japanese-English dialogue translation tasks show that the proposed approach significantly and consistently improves translation performance over a strong NMT baseline, which is directly built on the training data annotated with DPs.
翻译:Pronouns经常在亲滴语言中被遗漏,例如中文,通常在完整翻译的制作方面带来重大挑战。迄今为止,对神经机翻译中被丢弃的代名词(DP)问题很少注意。在这项工作中,我们提议了一种新的基于重建的方法,以减轻非代名词模式的代名词翻译问题。首先,所有来源语句中的代名词都自动附加说明,同时提供从双语培训资料中提取的平行信息。接着,附加说明的来源句从NMT模式中隐蔽的表述中重建。在重建分数方面,与NMT模式有关的参数被引导产生强化的隐性表述,尽可能鼓励嵌入附加说明的DP信息。关于中英和日英对话翻译任务的实验结果表明,拟议的方法在强大的NMT基线上不断大大改进翻译工作绩效,该基准直接建立在与DP一起附加说明的培训数据上。