语义判决嵌入式双语创制变换器 (A Bilingual Generative Transformer for Semantic Sentence Embedding)

Semantic sentence embedding models encode natural language sentences into vectors, such that closeness in embedding space indicates closeness in the semantics between the sentences. Bilingual data offers a useful signal for learning such embeddings: properties shared by both sentences in a translation pair are likely semantic, while divergent properties are likely stylistic or language-specific. We propose a deep latent variable model that attempts to perform source separation on parallel sentences, isolating what they have in common in a latent semantic vector, and explaining what is left over with language-specific latent vectors. Our proposed approach differs from past work on semantic sentence encoding in two ways. First, by using a variational probabilistic framework, we introduce priors that encourage source separation, and can use our model's posterior to predict sentence embeddings for monolingual data at test time. Second, we use high-capacity transformers as both data generating distributions and inference networks -- contrasting with most past work on sentence embeddings. In experiments, our approach substantially outperforms the state-of-the-art on a standard suite of unsupervised semantic similarity evaluations. Further, we demonstrate that our approach yields the largest gains on more difficult subsets of these evaluations where simple word overlap is not a good indicator of similarity.

翻译：将自然语言句数纳入矢量的语义化模型的语义性句嵌入模型,这样,嵌入空间的近距离就表明在句子之间语义的接近性。双语数据为学习这种嵌入提供了一个有用的信号: 翻译对配对中两个句子共享的属性可能是语义性的, 而不同的属性可能是文体性的或语言特有的。我们提出了一个深潜的潜伏变异模型, 试图在平行的句子上进行源分解, 将其在隐性语义矢量中的共同点分离出来, 并解释它们与语言特定潜在矢量中的大部分过去的工作相对照。在实验中, 我们的拟议方法与以往关于语义编码的工作有两种不同之处。首先, 通过使用变异性概率框架, 我们引入了鼓励源分离的前缀, 我们可以使用模型的后缀来预测在测试时单语系数据嵌入的句。其次, 我们使用高能变异变器作为数据生成分布和推断网络, 与大多数以往的句状嵌入方法形成对比。在实验中, 我们的拟议方法大大超越了语系状态- 。在一个标准的模型上, 我们的模型上, 展示了我们最难的图像中最难的图像中, 。