In this paper, we propose a method for obtaining sentence-level embeddings. While the problem of securing word-level embeddings is very well studied, we propose a novel method for obtaining sentence-level embeddings. This is obtained by a simple method in the context of solving the paraphrase generation task. If we use a sequential encoder-decoder model for generating paraphrase, we would like the generated paraphrase to be semantically close to the original sentence. One way to ensure this is by adding constraints for true paraphrase embeddings to be close and unrelated paraphrase candidate sentence embeddings to be far. This is ensured by using a sequential pair-wise discriminator that shares weights with the encoder that is trained with a suitable loss function. Our loss function penalizes paraphrase sentence embedding distances from being too large. This loss is used in combination with a sequential encoder-decoder network. We also validated our method by evaluating the obtained embeddings for a sentiment analysis task. The proposed method results in semantic embeddings and outperforms the state-of-the-art on the paraphrase generation and sentiment analysis task on standard datasets. These results are also shown to be statistically significant.
翻译:在本文中, 我们提出一种获取句级嵌入的方法。 虽然对字级嵌入问题的研究非常周密, 我们提出一种获取句级嵌入的新方法。 这是在解决参数生成任务时通过简单的方法获得的。 如果我们使用顺序编码- 解码编码器模型来生成参数句, 我们希望生成的参数在语义上与原句相近。 确保这一点的方法之一是增加对真实的副词嵌入的限制, 以便让真实的副词嵌入更加接近, 且不相关的副词句候选句嵌入的嵌入距离远。 使用顺序配对制导法, 与经过适当损失函数训练的编码器共享重量, 就能确保这一点。 我们的损失函数会惩罚嵌入距离太远的参数句句子。 这种损失会与顺序编码编码- 解密器网络结合使用。 我们还验证了我们的方法, 将获得的嵌入内容用于情感分析任务。 拟议的方法在语义嵌入中产生结果, 并超越了状态, 配制方法也会显示为重要数据生成结果。