We assess how multilingual language models maintain a shared multilingual representation space while still encoding language-sensitive information in each language. Using XLM-R as a case study, we show that languages occupy similar linear subspaces after mean-centering, evaluated based on causal effects on language modeling performance and direct comparisons between subspaces for 88 languages. The subspace means differ along language-sensitive axes that are relatively stable throughout middle layers, and these axes encode information such as token vocabularies. Shifting representations by language means is sufficient to induce token predictions in different languages. However, we also identify stable language-neutral axes that encode information such as token positions and part-of-speech. We visualize representations projected onto language-sensitive and language-neutral axes, identifying language family and part-of-speech clusters, along with spirals, toruses, and curves representing token position information. These results demonstrate that multilingual language models encode information along orthogonal language-sensitive and language-neutral axes, allowing the models to extract a variety of features for downstream tasks and cross-lingual transfer learning.
翻译:我们评估多语言模式如何维持一个共享的多语种代表空间,同时仍然对每种语言的语文敏感信息进行编码。我们用XLM-R作为案例研究,显示语言在中枢后处于类似的线性次空间,根据对语言模型性能的因果关系和88种语言子空间之间的直接比较进行评估。亚空空间是指在中层相对稳定的对语言敏感的轴和这些轴对象征性语言词汇等信息进行编码的不同。以语言方式转换的表达方式足以在不同语言中产生象征性的预测。然而,我们还确定了稳定的语言中性轴,对信息进行编码,例如象征性位置和部分语音。我们将图像显示在语言敏感和语言中枢轴上的表达方式,确定语言的家庭和部分语言群,以及螺旋、横线和代表象征性位置信息的曲线。这些结果表明,多种语言模式将信息与或多调语言敏感和语言中枢轴一起编码信息,从而能够为下游任务和跨语言传输学习提供各种特征模型。