ELMO背景嵌入的跨语文组合 (Cross-lingual alignments of ELMo contextual embeddings)

Building machine learning prediction models for a specific NLP task requires sufficient training data, which can be difficult to obtain for less-resourced languages. Cross-lingual embeddings map word embeddings from a less-resourced language to a resource-rich language so that a prediction model trained on data from the resource-rich language can also be used in the less-resourced language. To produce cross-lingual mappings of recent contextual embeddings, anchor points between the embedding spaces have to be words in the same context. We address this issue with a novel method for creating cross-lingual contextual alignment datasets. Based on that, we propose several cross-lingual mapping methods for ELMo embeddings. The proposed linear mapping methods use existing Vecmap and MUSE alignments on contextual ELMo embeddings. Novel nonlinear ELMoGAN mapping methods are based on GANs and do not assume isomorphic embedding spaces. We evaluate the proposed mapping methods on nine languages, using four downstream tasks: named entity recognition (NER), dependency parsing (DP), terminology alignment, and sentiment analysis. The ELMoGAN methods perform very well on the NER and terminology alignment tasks, with a lower cross-lingual loss for NER compared to the direct training on some languages. In DP and sentiment analysis, linear contextual alignment variants are more successful.

翻译：为具体的国家劳动力规划任务建立机器学习预测模型需要足够的培训数据,而这种数据对于资源较少的语言来说可能难以获得。跨语言嵌入的地图字从资源较少的语言嵌入到资源丰富的语言,这样也可以在资源较少的语言中使用经过资源丰富语言数据培训的预测模型。要制作最近背景嵌入的跨语言绘图,嵌入空间之间的锚点必须是同一背景下的词。我们用一种创新的方法来解决这个问题,以创建跨语言背景校准数据集。在此基础上,我们提出了多种跨语言的绘图方法。拟议的线性制图方法使用了现有的Vecmap和MUSE在相关的ELmo嵌入语言上的校准。Novel非线性ELMOGAN绘图方法以GANs为基础,不假定嵌入空间是单词嵌入空间。我们用四种下游任务来评估9种语言的拟议绘图方法:命名实体识别(NER)、依赖对比(DP)、术语校准和情感分析。ELMOGAN系统使用一些直接的语言和直线性语言对准方法,对语言进行了比较性分析。

相关内容

ELMo

关注 19

近年来，研究人员通过文本上下文信息分析获得更好的词向量。ELMo是其中的翘楚，在多个任务、多个数据集上都有显著的提升。所以，它是目前最好用的词向量，the-state-of-the-art的方法。这篇文章发表在2018年的NAACL上，outstanding paper award。下面就简单介绍一下这个“神秘”的词向量模型。

对比学习简述

专知会员服务

90+阅读 · 2021年6月29日

【ACL2020-Facebook AI】跨语言表示学习，Unsupervised Cross-lingual Representation Learning at Scale

专知会员服务

27+阅读 · 2020年4月5日

【表示学习(Representation Learning)】8篇 NeurIPS 2019论文选读

专知会员服务

54+阅读 · 2019年12月22日

【NAACL 2019 workshop】相似语言、变体和方言自然语言处理 The workshop on NLP for Similar Languages, Varieties and Dialects，约翰斯·霍普金斯大学|David Yarowsky

专知会员服务

5+阅读 · 2019年12月5日