One of the vital breakthroughs in the history of machine translation is the development of the Transformer model. Not only it is revolutionary for various translation tasks, but also for a majority of other NLP tasks. In this paper, we aim at a Transformer-based system that is able to translate a source sentence in German to its counterpart target sentence in English. We perform the experiments on the news commentary German-English parallel sentences from the WMT'13 dataset. In addition, we investigate the effect of the inclusion of additional general-domain data in training from the IWSLT'16 dataset to improve the Transformer model performance. We find that including the IWSLT'16 dataset in training helps achieve a gain of 2 BLEU score points on the test set of the WMT'13 dataset. Qualitative analysis is introduced to analyze how the usage of general-domain data helps improve the quality of the produced translation sentences.
翻译:机器翻译史上的重要突破之一是开发了变换器模型。 它不仅对各种翻译任务具有革命性,而且对其他大多数NLP任务也具有革命性。 在本文中,我们的目标是一个基于变换器的系统,能够将德文源句翻译成英文对口目标句。 我们在WMT'13数据集的德英平行句新闻评论上进行了实验。 此外,我们还调查了将更多普通数据纳入IWSLT'16数据集的培训中以提高变换器模型性能的影响。 我们发现,将IWSLT'16数据集纳入培训有助于WMT'13数据集测试集获得2个BLEU分数。 引入了定性分析通用数据的使用如何帮助提高所制作翻译句的质量。