There is an increasing amount of evidence that in cases with little or no data in a target language, training on a different language can yield surprisingly good results. However, currently there are no established guidelines for choosing the training (source) language. In attempt to solve this issue we thoroughly analyze a state-of-the-art multilingual model and try to determine what impacts good transfer between languages. As opposed to the majority of multilingual NLP literature, we don't only train on English, but on a group of almost 30 languages. We show that looking at particular syntactic features is 2-4 times more helpful in predicting the performance than an aggregated syntactic similarity. We find out that the importance of syntactic features strongly differs depending on the downstream task - no single feature is a good performance predictor for all NLP tasks. As a result, one should not expect that for a target language $L_1$ there is a single language $L_2$ that is the best choice for any NLP task (for instance, for Bulgarian, the best source language is French on POS tagging, Russian on NER and Thai on NLI). We discuss the most important linguistic features affecting the transfer quality using statistical and machine learning methods.
翻译:越来越多的证据表明,在缺乏或没有目标语言数据的情况下,关于不同语言的培训可以产生出令人惊讶的良好结果。然而,目前没有为选择培训(源)语言制定既定的指导方针。为了解决这一问题,我们彻底分析一个最先进的多语言模式,并试图确定什么影响语言之间的良好转让。与大多数多语言国家语言方案文献相比,我们不只用英语培训,而是用近30种语言培训。我们显示,对特定合成特征的审视比综合类似性能预测性能的2-4倍更有助于预测性能。我们发现,根据下游任务,合成特征的重要性差异很大――没有一个单一特征是所有国家语言方案任务的良好性能预测。因此,对于一个目标语言来说,我们不应该指望只有1美元,我们只用1美元来培训英语,而是用近30种语言进行。我们发现,对于任何国家语言方案任务的最佳选择(例如,保加利亚语系,最好的源语言在POS标签上是法语,俄罗斯语在NER和泰语系上,我们用影响统计质量传输的最重要语言特征)。