In this paper we describe the CUNI translation system used for the unsupervised news shared task of the ACL 2019 Fourth Conference on Machine Translation (WMT19). We follow the strategy of Artexte et al. (2018b), creating a seed phrase-based system where the phrase table is initialized from cross-lingual embedding mappings trained on monolingual data, followed by a neural machine translation system trained on synthetic parallel data. The synthetic corpus was produced from a monolingual corpus by a tuned PBMT model refined through iterative back-translation. We further focus on the handling of named entities, i.e. the part of vocabulary where the cross-lingual embedding mapping suffers most. Our system reaches a BLEU score of 15.3 on the German-Czech WMT19 shared task.
翻译:在本文中,我们描述了用于ACL 2019年第四次机器翻译会议(WMT19)未受监督的新闻共享任务的CUNI翻译系统,我们遵循Artexte等人(2018b)的战略,建立了一个基于种子语句的系统,根据经过单一语言数据培训的跨语言嵌入图绘制词组表,然后是经过合成平行数据培训的神经机器翻译系统。合成物质是由经调制的PBMT模型通过迭接回翻译精炼的单一语言材料制作的。我们进一步侧重于处理被命名的实体,即跨语言嵌入图最受影响的词汇部分。我们的系统在德国-捷克WMT19共同任务上达到了15.3的BLEU分数。