Contextual ranking models have delivered impressive performance improvements over classical models in the document ranking task. However, these highly over-parameterized models tend to be data-hungry and require large amounts of data even for fine tuning. This paper proposes a simple yet effective method to improve ranking performance on smaller datasets using supervised contrastive learning for the document ranking problem. We perform data augmentation by creating training data using parts of the relevant documents in the query-document pairs. We then use a supervised contrastive learning objective to learn an effective ranking model from the augmented dataset. Our experiments on subsets of the TREC-DL dataset show that, although data augmentation leads to an increasing the training data sizes, it does not necessarily improve the performance using existing pointwise or pairwise training objectives. However, our proposed supervised contrastive loss objective leads to performance improvements over the standard non-augmented setting showcasing the utility of data augmentation using contrastive losses. Finally, we show the real benefit of using supervised contrastive learning objectives by showing marked improvements in smaller ranking datasets relating to news (Robust04), finance (FiQA), and scientific fact checking (SciFact).
翻译:在文件排序任务中,背景排位模型比古典模型取得了令人印象深刻的业绩改进;然而,这些高度参数化模型往往是数据饥饿,甚至需要大量数据进行微调。本文件提出一个简单而有效的方法,利用对文件排序问题的监督对比性学习来改进较小数据集的排位绩效。我们通过使用查询文件对配中相关文档的部分内容来创建培训数据来增强数据。我们随后使用监督对比性学习目标从增强的数据集中学习一个有效的排位模型。我们在TREC-DL数据集子集上进行的实验表明,尽管数据增强导致培训数据规模的扩大,但不一定能够利用现有的点对齐或对齐的培训目标改进数据组的排位性绩效。然而,我们提出的监督性对比性损失目标导致业绩的改进,而标准非强化性设定则显示数据增强的效用,使用对比性损失。最后,我们通过显示与新闻(Robust04)、金融(FIQA)和科学事实检查(Sciact)有关的较小排位数据集的显著改进,显示了使用监督性对比性学习目标的真正好处。