Reranking algorithms have made progress in improving document retrieval quality by efficiently aggregating relevance judgments generated by large language models (LLMs). However, identifying relevant documents for queries that require in-depth reasoning remains a major challenge. Reasoning-intensive queries often exhibit multifaceted information needs and nuanced interpretations, rendering document relevance inherently context dependent. To address this, we propose contextual relevance, which we define as the probability that a document is relevant to a given query, marginalized over the distribution of different reranking contexts it may appear in (i.e., the set of candidate documents it is ranked alongside and the order in which the documents are presented to a reranking model). While prior works have studied methods to mitigate the positional bias LLMs exhibit by accounting for the ordering of documents, we empirically find that the compositions of these batches also plays an important role in reranking performance. To efficiently estimate contextual relevance, we propose TS-SetRank, a sampling-based, uncertainty-aware reranking algorithm. Empirically, TS-SetRank improves nDCG@10 over retrieval and reranking baselines by 15-25% on BRIGHT and 6-21% on BEIR, highlighting the importance of modeling relevance as context-dependent.
翻译:重排序算法通过高效聚合大语言模型(LLMs)生成的相关性判断,在提升文档检索质量方面取得了进展。然而,对于需要深度推理的查询,识别相关文档仍是一个主要挑战。推理密集型查询通常表现出多方面的信息需求和细微的解释差异,使得文档相关性本质上依赖于上下文。为解决这一问题,我们提出了上下文相关性,将其定义为文档在给定查询下相关的概率,该概率边际化于文档可能出现的不同重排序上下文分布之上(即文档被排序时所处的候选文档集合,以及文档呈现给重排序模型的顺序)。尽管先前的研究已经探讨了通过考虑文档顺序来缓解LLMs所表现出的位置偏差的方法,但我们通过实证发现,这些批次的组成也对重排序性能起着重要作用。为了高效估计上下文相关性,我们提出了TS-SetRank,一种基于采样的、不确定性感知的重排序算法。实证结果表明,TS-SetRank在BRIGHT数据集上将nDCG@10相对于检索和重排序基线提升了15-25%,在BEIR数据集上提升了6-21%,这突显了将相关性建模为上下文依赖的重要性。