一个嵌入式的所有单词嵌入式 (All Word Embeddings from One Embedding)

In neural network-based models for natural language processing (NLP), the largest part of the parameters often consists of word embeddings. Conventional models prepare a large embedding matrix whose size depends on the vocabulary size. Therefore, storing these models in memory and disk storage is costly. In this study, to reduce the total number of parameters, the embeddings for all words are represented by transforming a shared embedding. The proposed method, ALONE (all word embeddings from one), constructs the embedding of a word by modifying the shared embedding with a filter vector, which is word-specific but non-trainable. Then, we input the constructed embedding into a feed-forward neural network to increase its expressiveness. Naively, the filter vectors occupy the same memory size as the conventional embedding matrix, which depends on the vocabulary size. To solve this issue, we also introduce a memory-efficient filter construction approach. We indicate our ALONE can be used as word representation sufficiently through an experiment on the reconstruction of pre-trained word embeddings. In addition, we also conduct experiments on NLP application tasks: machine translation and summarization. We combined ALONE with the current state-of-the-art encoder-decoder model, the Transformer, and achieved comparable scores on WMT 2014 English-to-German translation and DUC 2004 very short summarization with less parameters.

翻译：在自然语言处理(NLP)基于神经网络的模型中,基于自然语言处理(NLP)的最大参数部分往往包含字嵌入。常规模型准备了一个大型嵌入矩阵,其大小取决于词汇大小。因此,将这些模型存储在记忆和磁盘存储中的成本很高。在这项研究中,为了减少参数总数,所有单词的嵌入都代表着对共同嵌入的转变。拟议方法, ALONE(所有单词嵌入一个), 通过修改与过滤器矢量共享的嵌入, 从而构建一个单词嵌入。然后, 我们把所建的嵌入的嵌入到一个基于词汇大小大小取决于词汇大小的嵌入神经网络。为了解决这个问题, 我们还引入了一种记忆高效的过滤构建方法。我们表示我们的“ALONE”可以通过对经过训练的字嵌入器的重建而充分使用词表达。此外,我们还将NLP应用程序的嵌入嵌入纳入一个供进式神经网络,以增加其表达性。很显然, 过滤器矢量与传统的嵌入式缩成2014年英国货币模型, 我们用了短式的机器翻译, 和超式变式变式的模型, 。

相关内容

词向量表示

关注 37

分散式表示即将语言表示为稠密、低维、连续的向量。研究者最早发现学习得到词嵌入之间存在类比关系。比如apple−apples ≈ car−cars， man−woman ≈ king – queen 等。这些方法都可以直接在大规模无标注语料上进行训练。词嵌入的质量也非常依赖于上下文窗口大小的选择。通常大的上下文窗口学到的词嵌入更反映主题信息，而小的上下文窗口学到的词嵌入更反映词的功能和上下文语义信息。

基于Transformer嵌入模型的个性化产品搜索，A Transformer-based Embedding Model for Personalized Product Search

专知会员服务

31+阅读 · 2020年5月20日

所有跨语言嵌入式都应该讲英语吗? | Should All Cross-Lingual Embeddings Speak English?

专知会员服务

7+阅读 · 2020年4月16日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

【哈工大车万翔教授】自然语言处理NLPer的核心竞争力是什么？，19页ppt

专知会员服务

34+阅读 · 2019年11月5日