Neural machine translation (NMT) has achieved great successes with large datasets, so NMT is more premised on high-resource languages. This continuously underpins the low resource languages such as Luganda due to the lack of high-quality parallel corpora, so even 'Google translate' does not serve Luganda at the time of this writing. In this paper, we build a parallel corpus with 41,070 pairwise sentences for Luganda and English which is based on three different open-sourced corpora. Then, we train NMT models with hyper-parameter search on the dataset. Experiments gave us a BLEU score of 21.28 from Luganda to English and 17.47 from English to Luganda. Some translation examples show high quality of the translation. We believe that our model is the first Luganda-English NMT model. The bilingual dataset we built will be available to the public.
翻译:神经机器翻译(NMT)在大量数据集方面取得了巨大成功,因此NMT更是以高资源语言为前提的。这持续支撑着像Luganda这样的低资源语言,因为缺少高质量的平行公司,因此,在撰写本文时,即使“Google”译文也没有为Luganda服务。在本文中,我们为Luganda和英语建立了一个平行的套件,有41 070个配对的句子,以三个不同的开放来源公司为基础。然后,我们在数据集上对NMT模型进行了高参数搜索。实验给了我们21.28比欧,从Luganda到英语,17.47比欧分,从英语到卢干达。一些翻译示例显示了翻译的高质量。我们认为我们的模型是第一座Luganda-英语NMT模型。我们建立的双语数据集将提供给公众。