Large multilingual language models show remarkable zero-shot cross-lingual transfer performance on a range of tasks. Follow-up works hypothesized that these models internally project representations of different languages into a shared interlingual space. However, they produced contradictory results. In this paper, we correct %one of the previous works the famous prior work claiming that "BERT is not an Interlingua" and show that with the proper choice of sentence representation different languages actually do converge to a shared space in such language models. Furthermore, we demonstrate that this convergence pattern is robust across four measures of correlation similarity and six mBERT-like models. We then extend our analysis to 28 diverse languages and find that the interlingual space exhibits a particular structure similar to the linguistic relatedness of languages. We also highlight a few outlier languages that seem to fail to converge to the shared space. The code for replicating our results is available at the following URL: https://github.com/maksym-del/interlingua.
翻译:大型多语种模式显示,在一系列任务上,不同语言的跨语言转移表现显著零分,后续工作假设,这些模式在内部将不同语言的项目表述纳入一个共享的多语种空间,但结果相互矛盾。在本文件中,我们纠正了以前著名的作品之一,即以前有1%的作品称“BERT不是一个Interlingua”,并表明,在适当选择了句号后,不同语言的表述方式实际上与这些语言模式的共享空间相吻合。此外,我们还表明,这种趋同模式在四个相近的计量和六个 mBERT类似的模型中是稳健的。我们然后将我们的分析扩展至28种不同的语言,发现这些语言间空间展示了类似于语言关联性的特殊结构。我们还强调了似乎无法与共享空间趋同的少数外来语言。在以下网址上可以找到复制我们结果的代码:https://github.com/maksym-del/interlingua。