An effective method to improve extremely low-resource neural machine translation is multilingual training, which can be improved by leveraging monolingual data to create synthetic bilingual corpora using the back-translation method. This work focuses on closely related languages from the Uralic language family: from Estonian and Finnish geographical regions. We find that multilingual learning and synthetic corpora increase the translation quality in every language pair for which we have data. We show that transfer learning and fine-tuning are very effective for doing low-resource machine translation and achieve the best results. We collected new parallel data for V\~oro, North and South Saami and present first results of neural machine translation for these languages.
翻译:提高极低资源神经机能翻译的有效方法是多语种培训,通过利用单语数据,利用反译法创建合成双语公司,可以改进这一培训。这项工作侧重于乌拉利语大家庭的密切相关的语言:来自爱沙尼亚和芬兰的地理区域。我们发现多语学习和合成公司提高了我们掌握数据的每种语文的翻译质量。我们表明,转让学习和微调对于进行低资源机器翻译和取得最佳结果非常有效。我们收集了V ⁇ ro、北萨米和南萨米语的新平行数据,并提供了这些语文的神经机翻译的第一结果。