Multilingual BERT (mBERT) has demonstrated considerable cross-lingual syntactic ability, whereby it enables effective zero-shot cross-lingual transfer of syntactic knowledge. The transfer is more successful between some languages, but it is not well understood what leads to this variation and whether it fairly reflects difference between languages. In this work, we investigate the distributions of grammatical relations induced from mBERT in the context of 24 typologically different languages. We demonstrate that the distance between the distributions of different languages is highly consistent with the syntactic difference in terms of linguistic formalisms. Such difference learnt via self-supervision plays a crucial role in the zero-shot transfer performance and can be predicted by variation in morphosyntactic properties between languages. These results suggest that mBERT properly encodes languages in a way consistent with linguistic diversity and provide insights into the mechanism of cross-lingual transfer.
翻译:多语种交流中心(mBERT)已经表现出相当的跨语言综合能力,它使得能够有效地零点跨语言交流综合知识,在某些语文之间,这种交流比较成功,但人们并不十分了解导致这种差异的原因,以及这种差异是否公正地反映了语言之间的差异。在这项工作中,我们调查了在24种类型不同的语言中由mBERT引发的语法关系的分布情况。我们表明,不同语言的分布之间的距离与语言形式学方面的综合差异非常一致。通过自我监督所学的这种差异在零点传递表现中起着关键的作用,可以通过不同语言之间形态特征的变化来预测。这些结果表明,MBERT以与语言多样性相一致的方式适当地编码语言的语法关系,并对跨语言传输机制提供洞察力。