Recently, substantial progress has been made in text ranking based on pretrained language models such as BERT. However, there are limited studies on how to leverage more powerful sequence-to-sequence models such as T5. Existing attempts usually formulate text ranking as classification and rely on postprocessing to obtain a ranked list. In this paper, we propose RankT5 and study two T5-based ranking model structures, an encoder-decoder and an encoder-only one, so that they not only can directly output ranking scores for each query-document pair, but also can be fine-tuned with "pairwise" or "listwise" ranking losses to optimize ranking performances. Our experiments show that the proposed models with ranking losses can achieve substantial ranking performance gains on different public text ranking data sets. Moreover, when fine-tuned with listwise ranking losses, the ranking model appears to have better zero-shot ranking performance on out-of-domain data sets compared to the model fine-tuned with classification losses.
翻译:最近,基于如BERT等经过预先培训的语言模型,在文本排序方面取得了长足进展,然而,关于如何利用诸如T5.等更强大的序列至序列模型的研究有限。 现有的尝试通常将文本排序作为分类,并依靠后处理获得排名列表。 在本文中,我们提出RankT5,并研究两个基于T5的排名模型结构,一个编码解码器,一个只读编码器,这样它们不仅能够直接输出每对查询文件的排序分数,而且可以与“错误”或“错误”排序损失进行微调,以优化排名性能。我们的实验表明,排名损失的拟议模型可以在不同的公共文本排名数据集上取得显著的排名业绩收益。 此外,在与列表损失进行微调时,排位模型似乎比分类损失模型在外数据集上取得更好的零率排序表现。