Contrastive learning has been studied for improving the performance of sentence embedding learning. The current state-of-the-art method is the SimCSE, which takes dropout as a data augmentation method and feeds a pre-trained Transformer encoder the same input sentence twice. Then, two sentence embeddings derived from different dropout masks can get to build a positive pair. A network being applied a dropout mask can be regarded as a sub-network of itself, whose expected scale is determined by the dropout rate. In this paper, we push most sub-networks with different expected scales can learn similar embedding for the same sentence. SimCSE failed to do so because they fixed the dropout rate to a tuned value, while we sampled dropout rates for each of the dropout functions. As this method will increase the difficulties of optimization, we also propose a simple sentence-wise masks strategy to sample more sub-networks. We evaluated the proposed S-SimCSE on several popular semantic text similarity datasets. Experimental results show that S-SimCSE outperforms the state-of-the-art SimCSE more than $1\%$ on BERT-base.
翻译:为了改进刑罚嵌入学习的绩效,已经研究过反向学习。目前最先进的方法是SimCSE, 它将辍学作为一种数据增强方法, 并给一个受过训练的变异器编码器提供相同的输入句两次。 然后, 由不同的辍学面罩产生的两句嵌入句可以形成一个正面的配对。 正在应用的网络可以被视为自己的子网络, 其预期规模由辍学率决定。 在本文中, 我们推动大多数具有不同预期尺度的子网络可以学习类似的同一句子嵌入。 SimCSE没有这样做, 因为他们将辍学率固定在一个调值上, 而我们为每个辍学功能抽样了辍学率。 由于这种方法会增加优化难度, 我们还提出一个简单的句式面具战略, 以抽样更多的子网络。 我们在几个流行的语义文本类似数据集上评估了拟议的S-SIMCSEEE。 实验结果表明, SIMCSEE比SimCSEE在BERBasbase上的状态更符合SimCSE值。