Translating to and from low-resource languages is a challenge for machine translation (MT) systems due to a lack of parallel data. In this paper we address the issue of domain-specific MT for Bambara, an under-resourced Mande language spoken in Mali. We present the first domain-specific parallel dataset for MT of Bambara into and from French. We discuss challenges in working with small quantities of domain-specific data for a low-resource language and we present the results of machine learning experiments on this data.
翻译:由于缺乏平行数据,从低资源语言和从低资源语言转换是机器翻译系统面临的一个挑战。在本文件中,我们讨论了班巴拉的域名MT问题,这是马里使用的一种资源不足的曼德语。我们为班巴拉的域名提供了第一个域名平行数据集。我们讨论了在使用少量低资源语言域名数据方面遇到的挑战。我们介绍了关于这些数据的机器学习实验结果。