Several studies have explored various advantages of multilingual pre-trained models (e.g., multilingual BERT) in capturing shared linguistic knowledge. However, their limitations have not been paid enough attention. In this paper, we investigate the representation degeneration problem in multilingual contextual word representations (CWRs) of BERT and show that the embedding spaces of the selected languages suffer from anisotropy problem. Our experimental results demonstrate that, similarly to their monolingual counterparts, increasing the isotropy of multilingual embedding space can significantly improve its representation power and performance. Our analysis indicates that although the degenerated directions vary in different languages, they encode similar linguistic knowledge, suggesting a shared linguistic space among languages.
翻译:一些研究探讨了多语种预先培训模式(如多语种BERT)在获取共享语言知识方面的各种优势,但是,这些模式的局限性没有得到足够的重视,在本文中,我们调查了多语种背景文字表述中的代谢退化问题,并表明选定语言的嵌入空间存在厌异问题。我们的实验结果表明,与单一语言模式一样,增加多语种嵌入空间的异性可以大大提高其代表性和性能。我们的分析表明,尽管退化的方向在不同语言中有所不同,但它们对类似的语言知识进行了编码,表明各语言之间有着共同的语言空间。