重新研究学习多种语文代表中的语言编码 (Revisiting Language Encoding in Learning Multilingual Representations)

Transformer has demonstrated its great power to learn contextual word representations for multiple languages in a single model. To process multilingual sentences in the model, a learnable vector is usually assigned to each language, which is called "language embedding". The language embedding can be either added to the word embedding or attached at the beginning of the sentence. It serves as a language-specific signal for the Transformer to capture contextual representations across languages. In this paper, we revisit the use of language embedding and identify several problems in the existing formulations. By investigating the interaction between language embedding and word embedding in the self-attention module, we find that the current methods cannot reflect the language-specific word correlation well. Given these findings, we propose a new approach called Cross-lingual Language Projection (XLP) to replace language embedding. For a sentence, XLP projects the word embeddings into language-specific semantic space, and then the projected embeddings will be fed into the Transformer model to process with their language-specific meanings. In such a way, XLP achieves the purpose of appropriately encoding "language" in a multilingual Transformer model. Experimental results show that XLP can freely and significantly boost the model performance on extensive multilingual benchmark datasets. Codes and models will be released at https://github.com/lsj2408/XLP.

翻译：变换器展示了在单一模式中学习多语言背景文字表达式的巨大力量。为了处理模型中的多语言句子, 通常会为每种语言指定一种可学习的矢量, 称为“ 语言嵌入 ” 。语言嵌入可以添加到嵌入或附在句首的词中。它可以作为变换器获取跨语言背景表达式的语言特定信号。在本文中, 我们重新审视语言嵌入的使用, 并找出现有表达式中的若干问题。通过调查语言嵌入和嵌入到自我注意模块中的文字之间的相互作用, 我们发现当前的方法无法很好地反映语言特定词的关联性。鉴于这些发现, 我们提议了一种名为“ 跨语言计划( XLP) ” 的新方法来取代语言嵌入。对于句子, XLP 投放语言嵌入语言特定的语系空间, 然后预测的嵌入将输入到变换器模型中, 其语言特定含义。这样, XLP 就可以在多语言变换/ 数据库模型中实现“ 语言” 大幅配置“ 语言 ” 和“ 塔里结果 ” 。在多语言变换模型上, 将显示 ASBL 。在多语言变码/ 数据库中, 数据库中, 将将可大的将显示为。

相关内容

MoDELS

关注 43

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

最新《低资源自然语言处理》综述论文，21页pdf

专知会员服务

61+阅读 · 2020年10月27日

最新《深度持续学习》综述论文，32页pdf

专知会员服务

183+阅读 · 2020年9月7日

史上最全！358篇机器学习&自然语言处理综述论文！都这儿了

专知会员服务

129+阅读 · 2020年7月18日