Related works used indexes like CKA and variants of CCA to measure the similarity of cross-lingual representations in multilingual language models. In this paper, we argue that assumptions of CKA/CCA align poorly with one of the motivating goals of cross-lingual learning analysis, i.e., explaining zero-shot cross-lingual transfer. We highlight what valuable aspects of cross-lingual similarity these indexes fail to capture and provide a motivating case study \textit{demonstrating the problem empirically}. Then, we introduce \textit{Average Neuron-Wise Correlation (ANC)} as a straightforward alternative that is exempt from the difficulties of CKA/CCA and is good specifically in a cross-lingual context. Finally, we use ANC to construct evidence that the previously introduced ``first align, then predict'' pattern takes place not only in masked language models (MLMs) but also in multilingual models with \textit{causal language modeling} objectives (CLMs). Moreover, we show that the pattern extends to the \textit{scaled versions} of the MLMs and CLMs (up to 85x original mBERT).\footnote{Our code is publicly available at \url{https://github.com/TartuNLP/xsim}}
翻译:在本文中,我们认为,CKA/CCA的假设与跨语言学习分析的激励目标之一不相符,即解释零点跨语言传输。我们强调了跨语言相似性的宝贵方面,这些指数未能捕捉,也没有提供激励性案例研究,以衡量多语言模式中跨语言代表的相似性。然后,我们引入了\textit{Average Neuron-WiseCorrelation(ANC)},作为免于CKA/CCA困难的一个直接替代方案,并且具体来说在跨语言背景下是好的。最后,我们利用ANC来构建证据,证明先前引入的“第一个匹配”,然后预测“模式不仅在隐蔽语言模型(MLMMs)中出现,而且还在具有 kextit{causal语言建模目标的多语种模型中出现。此外,我们展示了该模式延伸到MLMS/MS/MURMS/MURMS/MURSOU/MUMS)原始版本。