The paper describes the University of Cape Town's submission to the constrained track of the WMT22 Shared Task: Large-Scale Machine Translation Evaluation for African Languages. Our system is a single multilingual translation model that translates between English and 8 South / South East African Languages, as well as between specific pairs of the African languages. We used several techniques suited for low-resource machine translation (MT), including overlap BPE, back-translation, synthetic training data generation, and adding more translation directions during training. Our results show the value of these techniques, especially for directions where very little or no bilingual training data is available.
翻译:本文介绍了开普敦大学向WMT22共同任务:非洲语言大规模机器翻译评价的有限轨道提交的呈件。我们的系统是一个单一的多语种翻译模型,在英语和8个南非/南部非洲语言之间,以及非洲语言的特定对子之间翻译。我们使用了若干适合低资源机器翻译的技术,包括重叠的BPE、反译、合成培训数据生成,以及在培训期间增加更多的翻译方向。我们的结果显示了这些技术的价值,特别是对于几乎没有双语培训数据或没有双语培训数据的方向而言。