Pre-trained model such as BERT has been proved to be an effective tool for dealing with Information Retrieval (IR) problems. Due to its inspiring performance, it has been widely used to tackle with real-world IR problems such as document ranking. Recently, researchers have found that selecting "hard" rather than "random" negative samples would be beneficial for fine-tuning pre-trained models on ranking tasks. However, it remains elusive how to leverage hard negative samples in a principled way. To address the aforementioned issues, we propose a fine-tuning strategy for document ranking, namely Self-Involvement Ranker (SIR), to dynamically select hard negative samples to construct high-quality semantic space for training a high-quality ranking model. Specifically, SIR consists of sequential compressors implemented with pre-trained models. Front compressor selects hard negative samples for rear compressor. Moreover, SIR leverages supervisory signal to adaptively adjust semantic space of negative samples. Finally, supervisory signal in rear compressor is computed based on condition probability and thus can control sample dynamic and further enhance the model performance. SIR is a lightweight and general framework for pre-trained models, which simplifies the ranking process in industry practice. We test our proposed solution on MS MARCO with document ranking setting, and the results show that SIR can significantly improve the ranking performance of various pre-trained models. Moreover, our method became the new SOTA model anonymously on MS MARCO Document ranking leaderboard in May 2021.
翻译:BERT等预先培训的模型已被证明是处理信息检索问题的有效工具。由于其令人振奋的绩效,它被广泛用于解决真实世界的IMR问题,例如文件排名。最近,研究人员发现选择“硬”而不是“随机”的负面样本将有利于在排序任务方面微调预先培训的模型。然而,如何以原则性的方式利用硬性负抽样仍然难以找到。为了解决上述问题,我们提议了一种文件排名的微调战略,即自我投资评级(SIR),以动态选择硬性负性样本来构建高质量的语义空间,用于培训高质量的排名模型。具体地说,SIR由采用预先培训模型的顺序压缩器组成。Front 压缩机选择硬性负性样本用于后级压缩器。此外,SIR的影响力监督信号仍然难以以适应性地调整负性样本的语义空间。最后,我们根据条件概率计算后压缩的监管信号,从而能够控制样本动态,并进一步加强模型的性能。SIR是用预先培训的SIRA级模型的轻度和总体测试模型。SIA级模型,我们在SIM IMR 之前的SI 上可以大幅改进新的SI 的S-R 格式上,在S-RI-I-I-I-I-I-I-I-S-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-