We propose a new kind of embedding for natural language text that deeply represents semantic meaning. Standard text embeddings use the outputs from hidden layers of a pretrained language model. In our method, we let a language model learn from the text and then literally pick its brain, taking the actual weights of the model's neurons to generate a vector. We call this representation of the text a neural embedding. We confirm the ability of this representation to reflect semantics of the text by an analysis of its behavior on several datasets, and by a comparison of neural embedding with state of the art sentence embeddings.
翻译:我们提议了一种新型的自然语言文字嵌入方式, 深度代表语义含义。 标准文本嵌入方式使用预先训练的语言模式隐藏层的输出。 在我们的方法中, 我们让一个语言模型从文字中学习, 然后用模型神经元的实际重量来生成一个矢量, 我们把这个文本的表达方式称为神经嵌入方式。 我们通过分析其在若干数据集中的行为, 并通过比较神经嵌入与最先进的句子嵌入方式, 来确认这种表达方式能够反映文字的语义。