In neural network-based models for natural language processing (NLP), the largest part of the parameters often consists of word embeddings. Conventional models prepare a large embedding matrix whose size depends on the vocabulary size. Therefore, storing these models in memory and disk storage is costly. In this study, to reduce the total number of parameters, the embeddings for all words are represented by transforming a shared embedding. The proposed method, ALONE (all word embeddings from one), constructs the embedding of a word by modifying the shared embedding with a filter vector, which is word-specific but non-trainable. Then, we input the constructed embedding into a feed-forward neural network to increase its expressiveness. Naively, the filter vectors occupy the same memory size as the conventional embedding matrix, which depends on the vocabulary size. To solve this issue, we also introduce a memory-efficient filter construction approach. We indicate our ALONE can be used as word representation sufficiently through an experiment on the reconstruction of pre-trained word embeddings. In addition, we also conduct experiments on NLP application tasks: machine translation and summarization. We combined ALONE with the current state-of-the-art encoder-decoder model, the Transformer, and achieved comparable scores on WMT 2014 English-to-German translation and DUC 2004 very short summarization with less parameters.
翻译:在自然语言处理(NLP)基于神经网络的模型中,基于自然语言处理(NLP)的最大参数部分往往包含字嵌入。常规模型准备了一个大型嵌入矩阵,其大小取决于词汇大小。因此,将这些模型存储在记忆和磁盘存储中的成本很高。在这项研究中,为了减少参数总数,所有单词的嵌入都代表着对共同嵌入的转变。拟议方法, ALONE(所有单词嵌入一个), 通过修改与过滤器矢量共享的嵌入, 从而构建一个单词嵌入。 然后, 我们把所建的嵌入的嵌入到一个基于词汇大小大小取决于词汇大小的嵌入神经网络。 为了解决这个问题, 我们还引入了一种记忆高效的过滤构建方法。 我们表示我们的“ALONE”可以通过对经过训练的字嵌入器的重建而充分使用词表达。 此外,我们还将NLP应用程序的嵌入嵌入纳入一个供进式神经网络,以增加其表达性。 很显然, 过滤器矢量与传统的嵌入式缩成2014年英国货币模型, 我们用了短式的机器翻译, 和超式变式变式的模型, 。