Multilingual pre-trained models exhibit zero-shot cross-lingual transfer, where a model fine-tuned on a source language achieves surprisingly good performance on a target language. While studies have attempted to understand transfer, they focus only on MLM, and the large number of differences between natural languages makes it hard to disentangle the importance of different properties. In this work, we specifically highlight the importance of word embedding alignment by proposing a pre-training objective (ALIGN-MLM) whose auxiliary loss guides similar words in different languages to have similar word embeddings. ALIGN-MLM either outperforms or matches three widely adopted objectives (MLM, XLM, DICT-MLM) when we evaluate transfer between pairs of natural languages and their counterparts created by systematically modifying specific properties like the script. In particular, ALIGN-MLM outperforms XLM and MLM by 35 and 30 F1 points on POS-tagging for transfer between languages that differ both in their script and word order (left-to-right v.s. right-to-left). We also show a strong correlation between alignment and transfer for all objectives (e.g., rho=0.727 for XNLI), which together with ALIGN-MLM's strong performance calls for explicitly aligning word embeddings for multilingual models.
翻译:多语言预先培训的模型显示零球跨语言传输,对源语言进行微调的模型在目标语言上取得了令人惊讶的良好业绩。研究虽然试图理解传输,但只注重MLM,自然语言之间的大量差异使得难以分解不同属性的重要性。在这项工作中,我们特别强调了单词嵌入整合的重要性,为此提出了一个培训前目标(ALIGN-MLMM),其辅助损失部分引导不同语言的词组相似,以类似文字嵌入。ALIGN-MLM 要么超越或匹配了三个得到广泛接受的目标(MLM、XLM、DICT-MLMM),我们评价了自然语言对对等语言的转让,而自然语言对等则通过系统地修改像文字等特定属性而形成的对等。 特别是ALIGNM-M 将XLMM 和 MLM 相容换成35和30 F1点,关于POS-拖动不同语言之间转移的词组(左对右对左)的词组。我们还展示了在XIMMLM 目标的校准和统统制(LIM ) 的所有目标的统统调调和统调(LILM 等)之间紧密联系。