The use of contrastive loss for representation learning has become prominent in computer vision, and it is now getting attention in Natural Language Processing (NLP). Here, we explore the idea of using a batch-softmax contrastive loss when fine-tuning large-scale pre-trained transformer models to learn better task-specific sentence embeddings for pairwise sentence scoring tasks. We introduce and study a number of variations in the calculation of the loss as well as in the overall training procedure; in particular, we find that data shuffling can be quite important. Our experimental results show sizable improvements on a number of datasets and pairwise sentence scoring tasks including classification, ranking, and regression. Finally, we offer detailed analysis and discussion, which should be useful for researchers aiming to explore the utility of contrastive loss in NLP.
翻译:在计算机的视野中,使用差异化损失来进行代议制学习的做法已变得十分突出,现在自然语言处理(NLP)中正在引起注意。在这里,我们探索了在微调大型预先培训的变压器模型时使用批量软化对比损失的想法,以学习如何更好地为配对制刑期评分任务嵌入针对具体任务的判决。我们引入并研究了计算损失和总体培训程序的若干不同之处;特别是,我们发现数据打乱可能相当重要。我们的实验结果显示,一些数据集和配对的评分任务,包括分类、排行和回归任务,都取得了相当大的改进。最后,我们提供了详细的分析和讨论,这对研究人员探索国家语言处理方案中对比性损失的效用应该有用。