In the era of deep learning, word embeddings are essential when dealing with text tasks. However, storing and accessing these embeddings requires a large amount of space. This is not conducive to the deployment of these models on resource-limited devices. Combining the powerful compression capability of tensor products, we propose a word embedding compression method with morphological augmentation, Morphologically-enhanced Tensorized Embeddings (MorphTE). A word consists of one or more morphemes, the smallest units that bear meaning or have a grammatical function. MorphTE represents a word embedding as an entangled form of its morpheme vectors via the tensor product, which injects prior semantic and grammatical knowledge into the learning of embeddings. Furthermore, the dimensionality of the morpheme vector and the number of morphemes are much smaller than those of words, which greatly reduces the parameters of the word embeddings. We conduct experiments on tasks such as machine translation and question answering. Experimental results on four translation datasets of different languages show that MorphTE can compress word embedding parameters by about 20 times without performance loss and significantly outperforms related embedding compression methods.
翻译:在深层次学习的时代,文字嵌入在处理文本任务时是必不可少的。 但是, 保存和访问这些嵌入中需要大量空间。 这不利于在资源有限的装置上部署这些模型。 将高压产品的强大压缩能力结合在一起, 我们提出一个单词, 将压缩方法嵌入形态增殖、 体外增强增强的骨质增强( MorphTE ) 。 单词由一个或多个单词组成, 最小的单位具有意义或具有语法函数的最小单位 。 MorphTE 代表一个单词嵌入成一个混合形式的单词, 通过 Exmoror 产品, 将先前的语义和语法知识注入嵌入嵌入的学习中。 此外, 单词矢的维度比字数要小得多, 这大大降低了词嵌入参数 。 我们在机器翻译和问题解答等任务上进行了实验。 不同语言的四种翻译数据集的实验结果, 实验结果显示, 20个语言的模缩缩缩缩缩缩缩缩缩缩缩缩缩缩缩缩缩缩缩缩写的词汇方法显示 。