This paper investigates the performance of massively multilingual neural machine translation (NMT) systems in translating Yor\`ub\'a greetings ($\mathcal{E}$ k\'u [MASK]), which are a big part of Yor\`ub\'a language and culture, into English. To evaluate these models, we present IkiniYor\`ub\'a, a Yor\`ub\'a-English translation dataset containing some Yor\`ub\'a greetings, and sample use cases. We analysed the performance of different multilingual NMT systems including Google and NLLB and show that these models struggle to accurately translate Yor\`ub\'a greetings into English. In addition, we trained a Yor\`ub\'a-English model by finetuning an existing NMT model on the training split of IkiniYor\`ub\'a and this achieved better performance when compared to the pre-trained multilingual NMT models, although they were trained on a large volume of data.
翻译:本文研究了大规模多语言神经机器翻译 (NMT) 系统在将约鲁巴文化问候语 ($\mathcal{E}$ KÚ [MASK]) 翻译成英语时的性能。这些问候语是约鲁巴语言和文化的重要组成部分。为了评估这些模型,我们介绍了IkiniYorùbá,一个包含一些约鲁巴问候语及其翻译的约鲁巴-英语翻译数据集,并给出了一些使用情况样例。我们分析了不同的多语言 NMT 系统的性能,包括 Google 和 NLLB,并表明这些模型在将约鲁巴问候语准确翻译成英语方面存在困难。此外,我们通过对已有 NMT 模型在 IkiniYorùbá 的训练集上进行微调,训练了一个约鲁巴-英语模型,这个模型相较于预训练的多语言 NMT 模型在性能上有了显著提升,尽管它们训练在了大量的数据上。