Contrastive learning has shown great potential in unsupervised sentence embedding tasks, e.g., SimCSE. However, We find that these existing solutions are heavily affected by superficial features like the length of sentences or syntactic structures. In this paper, we propose a semantics-aware contrastive learning framework for sentence embeddings, termed Pseudo-Token BERT (PT-BERT), which is able to exploit the pseudo-token space (i.e., latent semantic space) representation of a sentence while eliminating the impact of superficial features such as sentence length and syntax. Specifically, we introduce an additional pseudo token embedding layer independent of the BERT encoder to map each sentence into a sequence of pseudo tokens in a fixed length. Leveraging these pseudo sequences, we are able to construct same-length positive and negative pairs based on the attention mechanism to perform contrastive learning. In addition, we utilize both the gradient-updating and momentum-updating encoders to encode instances while dynamically maintaining an additional queue to store the representation of sentence embeddings, enhancing the encoder's learning performance for negative examples. Experiments show that our model outperforms the state-of-the-art baselines on six standard semantic textual similarity (STS) tasks. Furthermore, experiments on alignments and uniformity losses, as well as hard examples with different sentence lengths and syntax, consistently verify the effectiveness of our method.
翻译:然而,我们发现,这些现有解决方案受到诸如刑期长度或合成结构等表面特征的严重影响。在本文件中,我们提议为嵌入句子的缩写式学习框架,称为Pseudo-Token BERT(PT-BERT),这个框架能够利用假齿空间(即隐含的语义空间)来表达一个句子,同时消除诸如刑期长度和合成法等表面特征的影响。具体地说,我们引入了一个不依赖 BERT 编码器的连续假符号嵌入层,将每个句子绘制成一个固定长度的假符号序列。利用这些假顺序,我们可以根据关注机制来进行对比学习来构建相同长的正对和负对。此外,我们利用渐变和动力增强的编码器来进行编译,同时动态地保持一个额外的缩放,以存储我们句子缩放式缩略图的缩略图表层层,用这些模拟模型来显示我们版本的缩略图的缩略图。