Massively multilingual machine translation (MT) has shown impressive capabilities, including zero and few-shot translation between low-resource language pairs. However, these models are often evaluated on high-resource languages with the assumption that they generalize to low-resource ones. The difficulty of evaluating MT models on low-resource pairs is often due the lack of standardized evaluation datasets. In this paper, we present MENYO-20k, the first multi-domain parallel corpus for the low-resource Yor\`ub\'a--English (yo--en) language pair with standardized train-test splits for benchmarking. We provide several neural MT (NMT) benchmarks on this dataset and compare to the performance of popular pre-trained (massively multilingual) MT models, showing that, in almost all cases, our simple benchmarks outperform the pre-trained MT models. A major gain of BLEU $+9.9$ and $+8.6$ (en2yo) is achieved in comparison to Facebook's M2M-100 and Google multilingual NMT respectively when we use MENYO-20k to fine-tune generic models.
翻译:大量多语种机器翻译(MT)显示出令人印象深刻的能力,包括低资源语言对口之间零翻译和少见翻译。然而,这些模型往往在高资源语言上进行评估,假设这些模型一般为低资源语言。评估低资源对口的MT模型的困难往往是由于缺乏标准化的评价数据集。在本文件中,我们介绍了低资源Yor ⁇ ub\'a-English(Yo-en)语言(Yo-en)的第一对多域平行文件(MNYO-20k),并配以标准化的培训测试分解。我们对这一数据集提供了几个神经MT(NMT)基准,并与流行的预先培训(多语种)MT模型的性能进行比较。我们使用MENYO-20k到微调通用模型时,几乎在所有情况下,我们简单的基准都超过了预先培训的MTM模型。BLEU+9.9美元和$+8.6美元(en2yo)的重大收益与FacebookM2M-100和谷多语言NMT分别实现。