It has been shown that machine translation models usually generate poor translations for named entities that are infrequent in the training corpus. Earlier named entity translation methods mainly focus on phonetic transliteration, which ignores the sentence context for translation and is limited in domain and language coverage. To address this limitation, we propose DEEP, a DEnoising Entity Pre-training method that leverages large amounts of monolingual data and a knowledge base to improve named entity translation accuracy within sentences. Besides, we investigate a multi-task learning strategy that finetunes a pre-trained neural machine translation model on both entity-augmented monolingual data and parallel data to further improve entity translation. Experimental results on three language pairs demonstrate that \method results in significant improvements over strong denoising auto-encoding baselines, with a gain of up to 1.3 BLEU and up to 9.2 entity accuracy points for English-Russian translation.
翻译:已经表明,机器翻译模式通常对在培训中并不常见的被点名实体产生较差的翻译,更早命名的实体翻译方法主要侧重于语音转写,这忽略了翻译的句子背景,在域和语言覆盖面方面是有限的。 为解决这一限制,我们提议DEEP,这是一个取消实体预培训方法,利用大量单语数据和知识库来提高判决中命名实体翻译的准确性。 此外,我们调查了一项多任务学习战略,对实体强化的单语数据和平行数据的事先训练的神经机翻译模型进行精细分析,以进一步改进实体翻译。 三种语言配对的实验结果显示,“方法”的结果大大改进了强化的自译码基线,增加了1.3个BLEU和9.2个实体英语-俄语翻译的精确点。