Semantic sentence embedding models encode natural language sentences into vectors, such that closeness in embedding space indicates closeness in the semantics between the sentences. Bilingual data offers a useful signal for learning such embeddings: properties shared by both sentences in a translation pair are likely semantic, while divergent properties are likely stylistic or language-specific. We propose a deep latent variable model that attempts to perform source separation on parallel sentences, isolating what they have in common in a latent semantic vector, and explaining what is left over with language-specific latent vectors. Our proposed approach differs from past work on semantic sentence encoding in two ways. First, by using a variational probabilistic framework, we introduce priors that encourage source separation, and can use our model's posterior to predict sentence embeddings for monolingual data at test time. Second, we use high-capacity transformers as both data generating distributions and inference networks -- contrasting with most past work on sentence embeddings. In experiments, our approach substantially outperforms the state-of-the-art on a standard suite of unsupervised semantic similarity evaluations. Further, we demonstrate that our approach yields the largest gains on more difficult subsets of these evaluations where simple word overlap is not a good indicator of similarity.
翻译:将自然语言句数纳入矢量的语义化模型的语义性句嵌入模型,这样,嵌入空间的近距离就表明在句子之间语义的接近性。 双语数据为学习这种嵌入提供了一个有用的信号: 翻译对配对中两个句子共享的属性可能是语义性的, 而不同的属性可能是文体性的或语言特有的。 我们提出了一个深潜的潜伏变异模型, 试图在平行的句子上进行源分解, 将其在隐性语义矢量中的共同点分离出来, 并解释它们与语言特定潜在矢量中的大部分过去的工作相对照。 在实验中, 我们的拟议方法与以往关于语义编码的工作有两种不同之处。 首先, 通过使用变异性概率框架, 我们引入了鼓励源分离的前缀, 我们可以使用模型的后缀来预测在测试时单语系数据嵌入的句。 其次, 我们使用高能变异变器作为数据生成分布和推断网络, 与大多数以往的句状嵌入方法形成对比。 在实验中, 我们的拟议方法大大超越了语系状态- 。 在一个标准的模型上, 我们的模型上, 展示了我们最难的图像中最难的图像中, 。