南部非洲语言低资源神经机翻译 (Low-Resource Neural Machine Translation for Southern African Languages)

Low-resource African languages have not fully benefited from the progress in neural machine translation because of a lack of data. Motivated by this challenge we compare zero-shot learning, transfer learning and multilingual learning on three Bantu languages (Shona, isiXhosa and isiZulu) and English. Our main target is English-to-isiZulu translation for which we have just 30,000 sentence pairs, 28% of the average size of our other corpora. We show the importance of language similarity on the performance of English-to-isiZulu transfer learning based on English-to-isiXhosa and English-to-Shona parent models whose BLEU scores differ by 5.2. We then demonstrate that multilingual learning surpasses both transfer learning and zero-shot learning on our dataset, with BLEU score improvements relative to the baseline English-to-isiZulu model of 9.9, 6.1 and 2.0 respectively. Our best model also improves the previous SOTA BLEU score by more than 10.

翻译：由于缺乏数据,非洲低资源语言没有从神经机翻译的进展中充分获益。我们以这一挑战为动力,比较了三种班图语(Shona、IsiXhosa和IsiZulu)和英语(Shona、IsiXhosa和IsiZulu)的零速学习、转移学习和多语言学习。我们的主要目标是英语到isiZulu翻译,我们只有3万对,占我们其他公司平均规模的28%。我们显示了在英语到isiXhosa和英语到Shona家长模式的学习成绩方面语言相似的重要性,这些模式的BLEU得分与我们的数据集的英文到isiZulu基本模式(分别为9.9、6.1和2.0)相比,我们的多语言学习超过了转移学习和零点学习。我们的最佳模式还改善了以前的SOTA BLEU得分超过10。

相关内容

Machine Translation

关注 210

机器翻译（Machine Translation）涵盖计算语言学和语言工程的所有分支，包含多语言方面。特色论文涵盖理论，描述或计算方面的任何下列主题:双语和多语语料库的编写和使用，计算机辅助语言教学，非罗马字符集的计算含义，连接主义翻译方法，对比语言学等。官网地址：http://dblp.uni-trier.de/db/journals/mt/

【Facebook AI】无监督机器翻译，336页ppt，Unsupervised Machine Translation

专知会员服务

19+阅读 · 2020年11月17日

神经常微分方程教程，50页ppt，A brief tutorial on Neural ODEs

专知会员服务

74+阅读 · 2020年8月2日

神经网络序列数据建模，229页ppt，Modeling Sequential Data with Neural Nets

专知会员服务

67+阅读 · 2020年7月25日

多语言神经机器翻译综述论文，34页pdf，A Comprehensive Survey of Multilingual Neural Machine Translation

专知会员服务

19+阅读 · 2020年4月25日