Despite the increasing number of large and comprehensive machine translation (MT) systems, evaluation of these methods in various languages has been restrained by the lack of high-quality parallel corpora as well as engagement with the people that speak these languages. In this study, we present an evaluation of state-of-the-art approaches to training and evaluating MT systems in 22 languages from the Turkic language family, most of which being extremely under-explored. First, we adopt the TIL Corpus with a few key improvements to the training and the evaluation sets. Then, we train 26 bilingual baselines as well as a multi-way neural MT (MNMT) model using the corpus and perform an extensive analysis using automatic metrics as well as human evaluations. We find that the MNMT model outperforms almost all bilingual baselines in the out-of-domain test sets and finetuning the model on a downstream task of a single pair also results in a huge performance boost in both low- and high-resource scenarios. Our attentive analysis of evaluation criteria for MT models in Turkic languages also points to the necessity for further research in this direction. We release the corpus splits, test sets as well as models to the public.
翻译:尽管大型和综合性机器翻译系统的数量不断增加,但由于缺乏高质量的平行公司以及同讲这些语言的人接触,对这些不同语言方法的评价受到限制。在本研究报告中,我们评估了土耳其语家庭22种语言的MT系统培训和评价的最先进方法,其中大部分是极低频的。首先,我们采用TIL Corpus,对培训和评估组合作了一些关键的改进。然后,我们培训了26个双语基线以及一个使用材料的多路神经MT模型,并利用自动计量以及人的评价进行了广泛的分析。我们发现MNMT模型几乎超越了外部测试中所有双语基线,并且对一对一对一的下游任务模式进行了微调,这也极大地促进了低、高资源情景的情景。我们仔细分析了突厥语模型的评估标准,也指出了进一步研究这一方向的必要性。我们发布了分解模型,作为公共模型的测试。