ALIG-MLMM:字嵌入一致是多语种预培训的关键 (ALIGN-MLM: Word Embedding Alignment is Crucial for Multilingual Pre-training)

Multilingual pre-trained models exhibit zero-shot cross-lingual transfer, where a model fine-tuned on a source language achieves surprisingly good performance on a target language. While studies have attempted to understand transfer, they focus only on MLM, and the large number of differences between natural languages makes it hard to disentangle the importance of different properties. In this work, we specifically highlight the importance of word embedding alignment by proposing a pre-training objective (ALIGN-MLM) whose auxiliary loss guides similar words in different languages to have similar word embeddings. ALIGN-MLM either outperforms or matches three widely adopted objectives (MLM, XLM, DICT-MLM) when we evaluate transfer between pairs of natural languages and their counterparts created by systematically modifying specific properties like the script. In particular, ALIGN-MLM outperforms XLM and MLM by 35 and 30 F1 points on POS-tagging for transfer between languages that differ both in their script and word order (left-to-right v.s. right-to-left). We also show a strong correlation between alignment and transfer for all objectives (e.g., rho=0.727 for XNLI), which together with ALIGN-MLM's strong performance calls for explicitly aligning word embeddings for multilingual models.

翻译：多语言预先培训的模型显示零球跨语言传输,对源语言进行微调的模型在目标语言上取得了令人惊讶的良好业绩。研究虽然试图理解传输,但只注重MLM,自然语言之间的大量差异使得难以分解不同属性的重要性。在这项工作中,我们特别强调了单词嵌入整合的重要性,为此提出了一个培训前目标(ALIGN-MLMM),其辅助损失部分引导不同语言的词组相似,以类似文字嵌入。ALIGN-MLM 要么超越或匹配了三个得到广泛接受的目标(MLM、XLM、DICT-MLMM),我们评价了自然语言对对等语言的转让,而自然语言对等则通过系统地修改像文字等特定属性而形成的对等。特别是ALIGNM-M 将XLMM 和 MLM 相容换成35和30 F1点,关于POS-拖动不同语言之间转移的词组(左对右对左)的词组。我们还展示了在XIMMLM 目标的校准和统统制(LIM ) 的所有目标的统统调调和统调(LILM 等)之间紧密联系。

相关内容

词向量表示

关注 37

分散式表示即将语言表示为稠密、低维、连续的向量。研究者最早发现学习得到词嵌入之间存在类比关系。比如apple−apples ≈ car−cars， man−woman ≈ king – queen 等。这些方法都可以直接在大规模无标注语料上进行训练。词嵌入的质量也非常依赖于上下文窗口大小的选择。通常大的上下文窗口学到的词嵌入更反映主题信息，而小的上下文窗口学到的词嵌入更反映词的功能和上下文语义信息。

NLP必读经典文献100篇

专知会员服务

124+阅读 · 2020年9月8日

史上最全！358篇机器学习&自然语言处理综述论文！都这儿了

专知会员服务

129+阅读 · 2020年7月18日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

19+阅读 · 2019年10月22日