We study the selection of transfer languages for different Natural Language Processing tasks, specifically sentiment analysis, named entity recognition and dependency parsing. In order to select an optimal transfer language, we propose to utilize different linguistic similarity metrics to measure the distance between languages and make the choice of transfer language based on this information instead of relying on intuition. We demonstrate that linguistic similarity correlates with cross-lingual transfer performance for all of the proposed tasks. We also show that there is a statistically significant difference in choosing the optimal language as the transfer source instead of English. This allows us to select a more suitable transfer language which can be used to better leverage knowledge from high-resource languages in order to improve the performance of language applications lacking data. For the study, we used datasets from eight different languages from three language families.
翻译:我们研究不同自然语言处理任务的转移语言的选择,特别是情绪分析、实体识别和依赖分析,为了选择一种最佳的转移语言,我们建议使用不同的语言相似度衡量语言之间的距离,并根据这些信息而不是根据直觉来选择转移语言;我们证明语言相似性与所有拟议任务的跨语言转移绩效相关;我们还表明,在选择最佳语言而不是英语作为转移源方面存在统计上的重大差异;这使我们能够选择一种更合适的转移语言,可以用来更好地利用高资源语言的知识,以改进缺乏数据的语言应用的绩效;关于研究,我们使用了来自三种语言的8种不同语言的数据集。